next up previous
Next: Iteration 4: Count Word Up: Finding Common Key Words Previous: Iteration 2: Count Word

Iteration 3: Count Word Sets of Length 3

The set ${\cal F}_2$ is the starting point for constructing new candidates of length 3. We look at all distinct pairs of members of ${\cal F}_2$, and form a new set of length 3 by taking a union of the pairs, as long as they share a common prefix, i.e., they have all elements in common except the last element. For example, if the two members are 12 and 13 then the common prefix is 1, and the resulting candidate is 123, but if the two pairs are 12 and 23 then we don't generate anything, since 12 and 23 have different prefixes. If we apply this procedure for all pairs we get the following candidate set, ${\cal C}_3 = \{ 123, 124, 125, 134, 135, 145, 234, 235, 245 \}$

After generating ${\cal C}_3$, we apply a pruning step, i.e., for each candidate we check if all of its length 2 subsets are members of ${\cal F}_2$. If we don't find a subset in ${\cal F}_2$, we discard that candidate, since it cannot be frequent (note: one can easily prove that if a candidate is frequent, then all its subsets must be frequent). For example, consider the set 135. It has 3 subsets of length 2, namely 13, 15 and 35. We see that 35 doesn't belong to ${\cal F}_2$. Thus we remove 135 from ${\cal C}_3$. Similarly, we remove 145, 235, and 245. After pruning the new ${\cal C}_3 = \{ 123, 124, 125, 134, 234 \}$.

We now count the occurrence of the candidates:

   Candidates:     123    124     125     134     234 
   ---------------------------------------------------
   Count  :        4       3       4       3       3

We find that all of these are frequent. Thus ${\cal F}_3 = \{123, 124, 125, 134, 234\}$


next up previous
Next: Iteration 4: Count Word Up: Finding Common Key Words Previous: Iteration 2: Count Word
Mohammed Zaki
10/30/1998