next up previous
Next: Program synopsis, input and Up: No Title Previous: Candidate HashTree

WebMiner Algorithm Pseudo-Code

We will now give the pseudo-code for the different steps of the algorithm. The main WebMiner algorithm is given below:
    MINCOUNT = ceil (min_per*NumWebPages);

    F1 = getF1();  //special case for first step

    for (k=2; F_{k-1} != NULL ; k++){

         /*Candidate generation */
         C_k = new HashTree;
         for each word-set in F_{k-1}, say w1{
             for each word-set in F_{k-1} after w1 (i.e. > w1), say w2{
                 if (w1 and w2 share common prefix of length k-2){
                     cand = union (w1, w2);
                     if (!prune(cand, F_{k-1})) C_k.Insert(cand);
                 }
             }
         }
         delete F_{k-1};         

         /*Candidate Counting*/
         for each input record, say R{
             for each k length subset of R, say S{
                   S' = C_k.find(S);
                   if (S' != NULL) S'.count++;
             }
         }

         /*Frequent set generation*/}
         for each set, say S, in C_k
             if (S.count >= MINCOUNT) F_k.push_back(S);

        sort(F_k)
        print F_k;
        delete C_k;
    }



Mohammed Zaki
10/30/1998