To insert a word-set in
, we start at the root and hash on successive items to follow the
appropriate child. At depth d we hash on the d-th item in
the word-set. We do this until we reach a leaf. The new word-set is then
inserted in the linked list of that leaf.
If the number of word-sets in that
leaf exceeds a threshold value, that node is converted into an
internal node, i.e., we create a new hash table for that node, and
then rehash (on the d-th item) the list of word-sets so that they
go to appropriate children. There is one exception where we do not
convert a leaf node to an internal node even if the threshold is
exceeded. This is the case if the depth of the hash tree would
exceed the length of the word-sets (which is the same as the
iteration number), i.e., we insure that the maximum depth of the
tree in iteration k is k.
The hash table size of the hash tree for each internal node should
be chosen as follows:
Let's look at the hash tree that would be generated in iteration 3 in
our example above. First there are 8 word-sets in . Therefore the hash table size is given as
We will use the hash
function for each hash table in an internal node. The
figure below shows how the tree grows as the candidates are added to
it.
The tree starts out as the root being a leaf which will store word-sets. We insert 123, and then 124, as shown.
When we try to insert 125, we see that the threshold T=2 for the leaf capacity is exceeded, at this point we have to turn the root into an internal node with a hash table of size 3. We then have to rehash 123, 124, and 125 based on their first item (1). In this case all three have the same first item, and they end up in the same leaf list. Since the capacity of the leaf is exceeded, we again have to rehash, turning the leaf node into an internal node. But this time we hash based on the second element (2). Once again all word-sets end up in the same leaf. For the third time, we rehash based on the third element of the word-sets. For 123, the third element is 3, and it goes into cell 0; similarly, 124 and 125 end up in leaf pointed to by cell 1 and 2, respectively.
We add the next wordset, 134 at the right spot by first hashing on 1 and then on 3. Finally for 234, when we hash on 2 we hit a leaf and we put 234 there.