The program will be run with the command-lines
WebMiner input_datafile min_per
The input file will have the following format:
NumWebPages MaxKeyWords WebPage Num_KeyWords KeyWords .....For our running example, the input file will look like:
6 5 1 3 1 2 5 2 4 1 2 3 4 3 5 1 2 3 4 5 4 4 1 2 3 5 5 4 1 2 3 4 6 3 1 2 5
With min_per = 0.5, your output should look like (it has the word set followed by the count):
Length 1 (5) 1 - 6 2 - 6 3 - 4 4 - 3 5 - 4 Length 2 (8) 1 2 - 6 1 3 - 4 1 4 - 3 1 5 - 4 2 3 - 4 2 4 - 3 2 5 - 4 3 4 - 3 Length 3 (5) 1 2 3 - 4 1 2 4 - 3 1 2 5 - 4 1 3 4 - 3 2 3 4 - 3 Length 4 (1) 1 2 3 4 - 3 Total = 19