Logo Data Structures and Algorithms with Object-Oriented Design Patterns in Java
next up previous contents index

AVL Search Trees

 

The problem with binary search trees is that while the average running times for search, insertion, and withdrawal operations are all tex2html_wrap_inline58920, any one operation is still O(n) in the worst case. This is so because we cannot say anything in general about the shape of the tree.

For example, consider the two binary search trees shown Figure gif. Both trees contain the same set of keys. The tree tex2html_wrap_inline62382 is obtained by starting with an empty tree and inserting the keys in the following order

displaymath63774

The tree tex2html_wrap_inline62384 is obtained by starting with an empty tree and inserting the keys in this order

displaymath63775

Clearly, tex2html_wrap_inline62384 is a better tree search tree than tex2html_wrap_inline62382. In fact, since tex2html_wrap_inline62384 is a perfect binary tree , its height is tex2html_wrap_inline63800. Therefore, all three operations, search, insertion, and withdrawal, have the same worst case asymptotic running time, tex2html_wrap_inline58920.

The reason that tex2html_wrap_inline62384 is better than tex2html_wrap_inline62382 is that it is the more balanced tree. If we could ensure that the search trees we construct are balanced then the worst-case running time of search, insertion, and withdrawal, could be made logarithmic rather than linear. But under what conditions is a tree balanced?

If we say that a binary tree is balanced if the left and right subtrees of every node have the same height, then the only trees which are balanced are the perfect binary trees. A perfect binary tree of height h has exactly tex2html_wrap_inline62770 internal nodes. Therefore, it is only possible to create perfect trees with n nodes for tex2html_wrap_inline63814. Clearly, this is an unsuitable balance condition because it is not possible to create a balanced tree for every n.

What are the characteristics of a good balance condition ?

  1. A good balance condition ensures that the height of a tree with n nodes is tex2html_wrap_inline58920.
  2. A good balance condition can be maintained efficiently. That is, the additional work necessary to balance the tree when an item is inserted or deleted is O(1).
Adelson-Velskii and Landisgif were the first to propose the following balance condition and show that it has the desired characteristics.

Definition (AVL Balance Condition)  An empty binary tree is AVL balanced  . A non-empty binary tree, tex2html_wrap_inline62746, is AVL balanced if both tex2html_wrap_inline62742 and tex2html_wrap_inline62744 are AVL balanced and

displaymath63776

where tex2html_wrap_inline63830 is the height of tex2html_wrap_inline62742 and tex2html_wrap_inline63834 is the height of tex2html_wrap_inline62744.

Clearly, all perfect binary trees are AVL balanced. What is not so clear is that heights of all trees that satisfy the AVL balance condition are logarithmic in the number of internal nodes.

Theorem  The height, h, of an AVL balanced tree with n internal nodes satisfies

displaymath63777

extbfProof The lower bound follows directly from Theorem gif. It is in fact true for all binary trees regardless of whether they are AVL balanced.

To determine the upper bound, we turn the problem around and ask the question, what is the minimum number of internal nodes in an AVL balanced tree of height h?

Let tex2html_wrap_inline63844 represent an AVL balanced tree of height h which has the smallest possible number of internal nodes, say tex2html_wrap_inline63848. Clearly, tex2html_wrap_inline63844 must have at least one subtree of height h-1 and that subtree must be tex2html_wrap_inline63854. To remain AVL balanced, the other subtree can have height h-1 or h-2. Since we want the smallest number of internal nodes, it must be tex2html_wrap_inline63860. Therefore, the number of internal nodes in tex2html_wrap_inline63844 is tex2html_wrap_inline63864, where tex2html_wrap_inline63866.

Clearly, tex2html_wrap_inline62596 contains a single internal node, so tex2html_wrap_inline63870. Similarly, tex2html_wrap_inline62362 contains exactly two nodes, so tex2html_wrap_inline63874. Thus, tex2html_wrap_inline63848 is given by the recurrence

  equation19120

The remarkable thing about Equation gif is its similarity with the definition of Fibonacci numbers  (Equation gif). In fact, it can easily be shown by induction that

displaymath63778

for all tex2html_wrap_inline62658, where tex2html_wrap_inline63880 is the tex2html_wrap_inline61080 Fibonacci number.

Base Cases

eqnarray19134

Inductive Hypothesis Assume that tex2html_wrap_inline63884 for tex2html_wrap_inline62668. Then

eqnarray19138

Therefore, by induction on k, tex2html_wrap_inline63884, for all tex2html_wrap_inline62658.

According to Theorem gif, the Fibonacci numbers are given by

displaymath61390

where tex2html_wrap_inline59512 and tex2html_wrap_inline59514. Furthermore, since tex2html_wrap_inline63898, tex2html_wrap_inline63900.

Therefore,

eqnarray19158

This completes the proof of the upper bound.

So, we have shown that the AVL balance condition satisfies the first criterion of a good balance condition--the height of an AVL balanced tree with n internal nodes is tex2html_wrap_inline63904. What remains to be shown is that the balance condition can be efficiently maintained. To see that it can, we need to look at an implementation.