next up previous
Next: Iteration 5: Count Word Up: Finding Common Key Words Previous: Iteration 3: Count Word

Iteration 4: Count Word Sets of Length 4

We use the procedure described in the previous iteration to generate a new ${\cal C}_4$ from ${\cal F}_3$, to get ${\cal C}_4 = \{1234, 1235, 1245 \}$, i.e., we consider all pairs from ${\cal F}_3$, and generate a set of length 4 by taking their union if they share a common prefix. For example, if the pairs are 123 and 124, then the common prefix is 12, and the result is 1234. If the pairs are 123 and 234, then there is no result since they don't share a common prefix.

Next we prune a candidate if any of its length 3 subsets is not in ${\cal F}_3$. For example 123, 125, 135, and 235 are the four length 3 subsets of 1235. We see that 135 doesn't appear in ${\cal F}_3$. Thus we discard the candidate 1235. We also discard 1245 for the same reason. The final candidate set is ${\cal C}_4 = \{1234\}$.

Now we count the candidates:

   Candidates:     1234
   --------------------
   Count  :        3
Since this is frequent, we get ${\cal F}_4 = \{1234 \}$.



Mohammed Zaki
10/30/1998