Next: Iteration 4: Count Word
Up: Finding Common Key Words
Previous: Iteration 2: Count Word
The set
is the starting point for constructing new
candidates of length 3. We look at all distinct pairs of members of
, and form a new set of length 3 by taking a union of the
pairs, as long as they share a common prefix, i.e., they have all
elements in common except the last element. For example, if the two
members are 12 and 13 then the common prefix is 1, and the resulting
candidate is 123, but if the
two pairs are 12 and 23 then we don't generate anything, since 12 and
23 have different prefixes. If we apply this procedure for all pairs
we get the following candidate set,

After generating
, we apply a pruning step, i.e.,
for each candidate we check if all of its length 2 subsets are members
of
. If we don't find a subset in
, we discard
that candidate, since it cannot be frequent (note: one can easily
prove that if a candidate is frequent, then all its subsets must be
frequent). For example, consider the set 135. It has 3 subsets of
length 2, namely 13, 15 and 35. We see that 35 doesn't belong to
. Thus we remove 135 from
. Similarly, we
remove 145, 235, and 245. After pruning the new
.
We now count the occurrence of the candidates:
Candidates: 123 124 125 134 234
---------------------------------------------------
Count : 4 3 4 3 3
We find that all of these are frequent. Thus

Next: Iteration 4: Count Word
Up: Finding Common Key Words
Previous: Iteration 2: Count Word
Mohammed Zaki
10/30/1998