Your successful completion of the ``Object Oriented Airlines'' project has resulted in union strike and a referral to File Compression, Inc., a company specializing in software for compressing the size of files. Mr. Zvi Lepmel, Chief Technical Officer of FCI, has brought you in as a consultant to help with the implementation of a new product that performs a type of compression known as Huffman coding. (You can find some background information on this technique in your textbook in Section 10.1.2.)
Huffman coding requires the ability to manipulate binary trees, so the main deliverable for this project will be the implementation of a binary tree class, similar to the generic container classes found in the C++ Standard Library. You have already met with the FCI team and sketched out the following interface for the class:
template <class T> class binary_tree { public: binary_tree(); ~binary_tree(); binary_tree(const T &); binary_tree(const binary_tree &); binary_tree & operator=(const binary_tree &); friend class iterator; class iterator { public: friend class binary_tree; iterator(); ~iterator(); T & operator*() const; bool operator!=(const iterator &) const; bool operator==(const iterator &) const; iterator left() const; // go to left child node iterator right() const; // go to right child node iterator parent() const; // go to parent node bool is_root() const; // true if at root bool is_leaf() const; // true if at a leaf }; iterator begin() const; iterator end() const; T & root(); void combine(binary_tree & t1, binary_tree & t2); };
In addition to the usual constructors and destructors, the binary tree class also contains an iterator class for traversing over the nodes in the tree. Binary tree iterators support some of the normal iterator operations, plus some special functions for traversing binary trees. Additional member functions of the binary tree class provide iterators to the "begin" (root) node and a special "end" node not in the tree, and fetch the value stored in the root node of the tree.
Here is an example of how the binary tree class and its iterator class is intended to be used:
// Print out contents of tree using an ``in-order'' traversal. void inorder_traversal(const binary_tree<int>::iterator & i) { if (i.is_leaf()) { std::cout << *i << " "; } else { inorder_traversal(i.left()); std::cout << *i << " "; inorder_traversal(i.right()); } } void main() { binary_tree<int> tree; ... // put some stuff into the tree inorder_traversal(tree.begin()); }
One member function deserves special mention. Since a binary tree container will be made up of dynamically allocated nodes, it is possible to "combine" two binary trees as the children of the root node of a third binary tree in constant time (this is similar to how STL lists can be spliced together). A special function is provided in the header and should be implemented by you to do exactly this. The nodes in t1 become the left subtree, and the nodes in t2 become the right subtree of the third tree (represented by *this). The "root" tree must consist of only a single node, and both t1 and t2 are empty afterwards. This sounds complicated, but remember it will be done in constant time!
With the help of this binary tree class (and some useful standard library components), a class for performing Huffman encoding and decoding can be implemented. The interface for this class is quite simple it provides two member functions, one to encode and one to decode. Both use standard library streams for their input and output.
class huffman_codec { public: void encode(std::istream &, std::ostream &); void decode(std::istream &, std::ostream &); };
The Huffman encoding algorithm is as follows:
To decode a file:
Writing out the encoding tree is fairly straightforward. First, we can ignore the weight information, since it is only used in the initial construction of the tree from the frequency data. Also, we can take advantage of the fact that every node in the tree is either a leaf or has exactly two children.
The algorithm to write out the tree is recursive:
Reading in and reconstructing the tree is also recursive:
The program will be run twice with the command-lines
huffman encode input-file compressed-file huffman decode compressed-file outputThe decoded output will be compared to (and should be identical with) the input file. You can use any file for sample input (even binary files).
This directory is also accessible directly on the RCS file systemProject 2 Auxiliary Files Directory
/dept/cs/cs230/public/projects/project2Be sure to carefully look over the code and make sure you understand it.
You will need to complete certain parts of the Huffman encoding/decoding algorithms in the file huffman_codec.cpp. Search for the string "COMPLETE" in the source code and follow the comments.
bool read(std::istream & in);The bit_packer class provides two functions, one that writes a string of bits to a stream, and a second that should be used after you have processed the last bit to ensure all bits are actually written (this is necessary since the system can only write bytes, or groups of 8 bits, at a time):
void write(std::ostream &, const std::basic_string<bool> &); void flush(std::ostream &);
std::basic_string<bool>This type works just like the std::string class you have used before (including working with standard library iterators and algorithms).
c:\program files\devstudio\vc\include\iosfwd(83) : warning C4804: '<' : unsafe use of type 'bool' in operation c:\program files\devstudio\vc\include\iosfwd(127) : warning C4305: 'return' : truncation from 'const int' to 'bool'These are in the standard library and can be safely ignored.
Your program must compile and run using the Visual C++ developement environment.
The final submission deadline is November 10 at 11:59:59 pm. Projects submitted later than this will NOT be accepted for credit.
Detailed project submission guidelines will be posted on the course web site.
The program will be graded 25% for a clean compilation, 40% for correctness of the program, and 35% for program structure. Criteria for program correctness may be discerned from the foregoing description and will solely be based on the output from your program. Program structure includes class organization, member function design, code readability (indentation, variable names, etc.), and commentary. There should be comments describing the purpose of each class and the member functions and member variables. The comments about each class must also include a discussion of the flexibility of the class design and how well it will support future changes. Keep other comments, especially within the body of each function, short and concise. Avoid comments that just state in English what is obvious from the code. One bad example is
i ++; // increment i
Students may discuss the class and algorithm design. All code must be implemented by the student alone. Sharing of code among students is expressly forbidden and will be detected by code comparison tools. Projects from all six sections will be graded together by the TAs.