next up previous
Next: Notes, hints, and assumptions Up: No Title Previous: WebMiner Algorithm Pseudo-Code

Program synopsis, input and output format

The program will be run with the command-lines

    WebMiner input_datafile min_per

The input file will have the following format:

NumWebPages
MaxKeyWords
WebPage Num_KeyWords  KeyWords
.....
For our running example, the input file will look like:
6
5
1 3 1 2 5
2 4 1 2 3 4
3 5 1 2 3 4 5
4 4 1 2 3 5
5 4 1 2 3 4
6 3 1 2 5

With min_per = 0.5, your output should look like (it has the word set followed by the count):

Length 1 (5)
1 - 6
2 - 6
3 - 4
4 - 3
5 - 4

Length 2 (8)
1 2 - 6 
1 3 - 4
1 4 - 3
1 5 - 4
2 3 - 4 
2 4 - 3
2 5 - 4
3 4 - 3

Length 3 (5)
1 2 3 - 4
1 2 4 - 3
1 2 5 - 4
1 3 4 - 3
2 3 4 - 3

Length 4 (1)
1 2 3 4 - 3

Total = 19



Mohammed Zaki
10/30/1998