next up previous
Next: Data Structures for WebMiner Up: No Title Previous: Output

Data Analysis

If you look carefully, you'll find that all the frequent sets are subsets of either 1234 or 125. Using this information, we find all the pages that contain 1234. This is the page set $\{2, 3, 5\}$. Also the set of pages that contain 125 is $\{1, 3, 4, 5\}$. If we convert the identifiers back to the page names and key words we find:
   word set ---- web pages
   ------------------------------------------------
   1234 ---- 2,3,5   
   computer_vision, computer_systems, programming_languages, 
   computation_theory --- RPI_2, MIT, UofR_1

   125 --- 1,3,4,5
   computer_vision, computer_systems, semiconductors
        --- RPI_1, MIT, NWU, UofR_2

We can now classify the web pages RPI_2, MIT, UofR_1 as belonging to CS departments, while the pages RPI_1, MIT, NWU, UofR_2 belong to EE departments. Note that MIT belongs to both CS and EE.



Mohammed Zaki
10/30/1998