Next: Iteration 1: Count Word
Up: No Title
Previous: WebPage Data
Let's see how WebMiner finds common key words. The algorithm used by
WebMiner takes multiple steps. In each step it counts the number of
occurrences of a set of words. In step i it counts sets of size i.
We say that a word set is frequent if appears in at least
min_per percent of the web pages. The value of min_per is an
input to the algorithm. For example, if min_per = 0.5 (50%), then any set
of words that occurs in at least 3 (50% of 6) pages is considered
frequent.
Lets go back to our example and look at the different steps, assuming
min_per = 0.5
Mohammed Zaki
10/30/1998