next up previous
Next: Iteration 2: Count Word Up: Finding Common Key Words Previous: Finding Common Key Words

Iteration 1: Count Word Sets of Length 1

In the first iteration we count word sets of length 1. We first have to form a list of candidate word sets that we will count in the current iteration. For iteration 1, the candidate set, denoted ${\cal
C}_1$, is the set of all the unique words under consideration, i.e., ${\cal C}_1 = \{1, 2,
3, 4, 5 \}$.

The next step is to count the number of times each candidate occurs in the database. We get the following result:

   Candidates:     1       2       3       4       5 
   ---------------------------------------------------
   Count  :        6       6       4       3       4
For example, the above table says that word 1 appears in all 6 web pages. At this stage if any word set is not frequent (occurs less than 3 times) we drop it from consideration. The remaining word sets form the set of frequent word sets of length 1, denoted ${\cal F}_1$.For the above example ${\cal F}_1 = \{ 1, 2, 3, 4, 5\}$.



Mohammed Zaki
10/30/1998