next up previous
Next: Iteration 1: Count Word Up: No Title Previous: WebPage Data

Finding Common Key Words

Let's see how WebMiner finds common key words. The algorithm used by WebMiner takes multiple steps. In each step it counts the number of occurrences of a set of words. In step i it counts sets of size i. We say that a word set is frequent if appears in at least min_per percent of the web pages. The value of min_per is an input to the algorithm. For example, if min_per = 0.5 (50%), then any set of words that occurs in at least 3 (50% of 6) pages is considered frequent.

Lets go back to our example and look at the different steps, assuming min_per = 0.5


Mohammed Zaki