A word-set of length k has a vector of ints of length k for storing the different elements, and it has a count field for storing the frequency. The word-set is sorted. Note that you will never have to explicitly sort the word-set. The nature of the candidate generation procedure should guarantee that the new word-sets in each step are always sorted.
The set of frequent word-sets of length k, ,
should be stored as a sorted vector or
linked-list of word-sets. For efficiency, store new elements using
push_back(), and then sort the vector before printing.
The set of candidate word-sets of length k, must be
stored in a HashTree data
structure, described below.