Advanced Programming---Rensselaer CSCI-6090 -- Homework 2

Go backward to Homework 1
Go up to Homework
Go forward to A Solution to Homework 2

Homework 2

This homework assignment is due on Friday, October 12 at 11:59:59 pm, and it will be graded. (Counts about 10% of the course grade.) On this homework, no collaboration is allowed; all work must be your own.
Late submissions: Assignments turned in up to 24 hours after the due date will receive a 10% penalty. After that, late assignments will receive a 20% penalty up until the time solutions are posted or discussed in class, after which they cannot be accepted at all. A solution to this assignment will be posted at 6pm on Thursday, October 18. Students should not discuss their solutions or attempts at a solution with any other student before that time, even if they have already submitted their work, and even if they have already received a grade on it.
See also the expanded submission instructions, additional notes, and frequently asked questions at the end of this section.
Topological sorting

A topological sort of a directed acyclic graph G=(V,E) is a linear ordering of all its vertices such that if G contains an edge (u,v) then u appears before v in the ordering. Note that a given graph generally has more than one topological ordering.
Chapter 3 in the BGL tutorial describes a generic topological sort algorithm topological_sort that is based on supplying an appropriate visitor to depth first search. The documentation of this algorithm is given in section 13.2.5 of the text. However, for this assignment we shall be using a different approach.
Another way to perform topological sorting on a directed acyclic graph G=(V,E) is to repeatedly find a vertex of indegree 0, output it, and remove it and all of its outgoing edges from the graph. This can be implemented as an O(|V| + |E|) algorithm (same as the BGL algorithm in section 13.2.5), by keeping proper track of vertices with indegree 0; i.e, at any point it is needed you must be able to find such a vertex in constant time.
Algorithm and testing method development
There are three principal functions to be developed:
Implement the above approach (in C++ using BGL), as a generic algorithm with basically the same interface as the BGL algorithm as described in section 13.2.5. Note that that interface does not assume bidirectionality of the graph representation, so your algorithm will have to initially compute and keep track of indegrees. Your algorithm should have an O(|V| + |E|) time bound.
Note: There is an error in the specification in Section 13.2.5, in saying that the input can be "a directed or undirected graph." It must be directed.

Specify and implement an topological sort acceptance testing function. This is a boolean valued function that accepts an acyclic directed graph and a linear sequence of vertices as inputs, and returns true if and only if the sequence is a topological ordering of the graph. (This function will be needed in order to check the result of running your topological sort algorithm on large graphs, as required in the second part of the assignment, since for these graphs it will be too difficult to check the result by hand.)
Note that you cannot implement this function by calling another (trusted) topological sorting algorithm and comparing its result with the given ordering, since the other algorithm may produce a correct but different ordering.
In general, an acceptance testing function should be written in as simple a way as possible, so that its own correctness is easily verified by inspection of the code. But if it is to be applied to large data sets you do have to be careful that it doesn't take more time than the algorithm it is being used to check! (Thus in this case, it should have an O(|V| + |E|) bound, whereas an "obvious" method might take O(|V|² + |E|).)

Specify and implement a function that generates random acyclic graphs of a given size and denseness. This function will be needed in order to test your topological sort algorithm on large graphs. You should be able to control the denseness of graph by varying the probability of edge creation in the graph. For example a probability of 0.7 would mean that, out of all the 10 possible edges in the graph, 7 of the edges are actually created, on average. One problem in creating random graphs is that cycles may appear in the graph. To make sure you create random acyclic graphs, use the following method:

Let the vertices be numbered from 1 to n.

Whenever adding an edge to vertex k, always choose one of the vertices numbered between k+1 to n. The choice of vertices in this range is done randomly. This will ensure that there is no edge going back to any of the parent vertices in the graph.

Although this method eliminates some randomness in the graph creation, it still is a reasonable way for purposes of algorithm testing to create large acyclic directed graphs.

Testing methodology
Test your programs as follows:

Start testing your algorithm with small graphs for which the result can be hand-checked. Chapter 3 (File Dependency graph) and section 13.2.5 each have such an example.

Once you get the above part running you can go ahead and test your algorithm on large random graphs. In such tests you will use your random graph generator to generate a graph, apply your algorithm to it, and check the correctness of the result with your acceptance testing function. Among the tests you do, include one with a graph with 1,000 vertices and about 5,000 edges (edge probability 0.01). (Originally the assignment said 0.05, but as as noted in class on 10/5, this was incorrect. A complete DAG with N vertices has N(N-1)/2 edges, so for N = 1,000, about 500,000 edges.)

Submission
Submit all of your code including the program or programs that set up the test cases. Add a README file that briefly describes the programs. It should also describe the results of the testing, explicitly mentioning any case in which one of your programs does not work as specified or does not finish or takes an excessive amount of time. Put all the files in a directory and zip the directory. The directory should be named the same as your rcs-id. E.g., my rcs-id is mussed (different from my CS lab id), so I would put my files in a directory named mussed and execute (from the directory level just above it)
zip -r mussed mussed
This would create a zip file named mussed.zip. Mail the zip file to Arvind (venkaa@cs.rpi.edu) on or before the due date. (Note that zip is available on campus Unix systems as well as Windows boxes.)
See also the additional submission requirements posted by Arvind on his Web homepage. (Check them again for any updates just before you submit.)
Additional Notes

Data files for one of the suggested small example graphs are available in the 9/21/2001 news article. As I mentioned in class, the graph created by reading in these files is slightly different from the graph in Figure 3.1 in the BGL book. The reason it is different is just that when I assigned numbers to the vertices in the file, I used the numbers 1 through 15, whereas I should have used 0 through 14 to make them convenient for use as vector indices. So I added a new vertex 0 and gave it the name foo.cpp. This choice of name was a slight mistake too, since there was already a name foo.cpp in the data. This causes no harm in terms of the way the algorithm works (it is working on the distinct vertex numbers 0 through 15), but makes the output look a little strange. I have since changed the name of vertex 0 in the online file.

For certain operations a hash table is the data structure of choice, since it offers (average case) constant time insertion and lookup. For example, checking equality between two sets of values of an arbitrary type (not just integer values) can be done in linear time. If you want to use a hash table for such purposes, use the hash_set or hash_map containers in the GNU C++ library (which are adapted from SGI STL). Unfortunately the C++ standard does not require such hash tables to be included in the standard library, so your code that uses them may not be portable. If you wish to use another compiler or maintain portability to other compilers, you may use the standard set or map container instead. This will likely introduce an extra log factor in the computing times of your algorithm; e.g., your acceptance testing function could become an O(|V| log|V| + |E|) algorithm instead of meeting the stated requirement, O(|V| + |E|). This will be permitted without penalty. You should not program your own hash table.

Questions Asked (Some of Them Frequently)

Q.
I'm working on the homework and I've run into a snag if I define my graph to be as follows: typedef adjacency_list<listS, vecS, directedS> Graph Graph G then call: in_degree(*vi,G) where vi is a vertex_iterator I get a compilation error stating that Graph doesn't have a member m_in_edges. I read further in the BGL book in section 12.1.4 and it says that only BidirectionalGraph supports in_degree which conflicts with what is written in the assignment where it says the graph needs to be directed.

A. Remember, the assignment says "Note that that interface does not assume bidirectionality of the graph representation, so your algorithm will have to initially compute and keep track of indegrees." If the graph were bidirectional you could get indegrees almost for free just from the iterators for incoming edges. Instead you'll have to compute them initially by traversing the graph and counting how many edges have a vertex as its target. Then you need to maintain the indegrees as the algorithm proceeds.

Q.
I have question about homework 2. I've just implemented my idea and tried to run against the graph on page 206 of the BGL book. But my program couldn't process the graph correctly. The graph seems to have a cycle. A cycle detection function introduced in page 54 of the book also says the graph is cyclic. Should my program process it correctly?

A. Get rid of the (f,f) edge. (Apparently the topological sort algorithm in the BGL works even if you have such a self-edge, but the new algorithm you implement for this assignment doesn't have to.)

Q. I nearly finished my source code, but couldn't figure out how to compile it on VC++ or is VC++ ok for this program?
A. As I've said in class, you might be able to compile BGL programs with VC++ -- or you might not, since VC++ doesn't support the C++ standard very well. If you don't want to struggle with whatever problems come up, you should use g++. You can do this on the CSLab Sun workstations or on your Windows box if you download and install the Cygnus tools (which you really should have anyway since otherwise a Windows box is only marginally useful for CS purposes). See the course Web page articles on compiling programs with the boost lib using g++.

Q.
For checking if the topological order that i generate is correct or not, is the following a sufficient condition to check... 1. For all the vertices, store their indegrees in a vector...O(V) 2. For each vertex in the topologial order generated, check if the vertex has indegree 0 and for all the outedges decrement the indegrees of the target vertex in the vector generated in step 1. ...O(V+E). For the ordering to be correct, the vertex being checked should have value 0 in the indegree vector. Am i missing any other condition that i should check ?

A. Sounds like you are trying to do the check by programming the same algorithm. This won't work; if you have a mistake in your original algorithm you likely have the same mistake in the version you use for checking. To write the checking function, you should concentrate on testing what the definition of a topological ordering says: given a DAG G and a linear sequence S, check that for every (u, v) that is an edge in G, u appears before v in S. Also check that all vertices in G, and only those vertices, appear in S.

Q. Since we are writing a generic random graph generator and the node of the graph can be any type (therefore generic), how can our program know how to detect any difference between those nodes?
A. It isn't required that your random DAG generator be a generic algorithm. It's okay if it only produces graphs with integers 0, 1, 2, ..., N-1 as vertex descriptors.

Q. If our DAG generator isn't generic and we produce graphs with integers as vertex descriptors, doesn't that mean that our sort function only needs to accept int type? Why even bother with generic topo sort?
Can we write our sort function given the knowledge that the vertex descriptors are integers?
Because I don't see how we gonna write our sort function to be generic without using a std::pair for the vertex.
A. No, the purpose of testing with large random graphs is to mainly to make sure that your algorithm is reasonably efficient; it is not intended to test for other things such as that your algorithm is generic. The requirement that the algorithm is generic still holds. Thus you should not assume that the graph's vertex descriptors are integers. Nor should you assume that they are std::pairs. If in your algorithm you need to use vertex descriptors as unique ids, try using a hash_map or map with the vertex descriptor type as the key type.

Q. For homework 2, when designing the randomly generated graphs, should they be fully connected? I thought we should do so, then decided maybe it's not needed. Which should we do? Full connections, or allow multiple chunks of graphs?
A. Fully-connected graphs are not required.

Q. I have tried to get the program in section 13.2.5 to work, and just can't get it to compile. I know that it has to do with the actual call to topological_sort that I am making, but even after looking at the boost code that you have shared, I can't figure out what is wrong. I'm sure that I'm totally missing something important, but if I can't even get the example going I have no hope for the homework. I have included the source code that I am using.
A. It may be that your problem is just that you need to increase template instantiation depth. I tried compiling your program with -ftemplate-depth-30 on the command line, and it compiled and ran properly. [I shortly afterward got a reply that said it did solve the problem.]

Q. In part 1, do we need error checking that tests whether the graph is acyclic, or is it just assumed that only acyclic graphs will be passed in?
A. You can assume that acyclic graphs will be passed in.

Q. I'm finishing up the random graph generator (which is actually pretty easy), and I'm pretty sure that the formula given on the website for maximum number of edges on a DAG with N vertices should be N(N - 1)/2, not N(N + 1)/2. With the latter formula, self-edges are included. A quick example is a graph with 1 vertex must have 0 edges, but by the formula, 1(1 + 1)/2 = 1. Is this correct?
A. Yes, it should be N(N-1)/2 for the maximum number of edges on a DAG. This has been corrected in the online problem statement.

Q. Also, does the topological sorting function have to work if listS is selected for the vertex list, i.e., adjacency_list<vecS, listS, directedS>? It seems that my algorithm would be quicker with vecS for the vertex list, because I am not doing any insertions or removals. Although I'm not sure if it'd still be considered generic if it didn't accept listS. What do you think?
A. The topological sort should work even if you have listS for the vertex list. How do you know? Because the specification of the BGL topological sort in Section 13.2.5 says that "the graph type must be a model of VertexListGraph and IncidenceGraph." If you look at the requirements of these concepts (see p. 159), you see that listS for the vertex list satisfies those requirements.

Q. I just realized that my sorting function gives the topological sort in its normal order (not reverse). Should I reverse it inside the sorting function to keep it exactly the same as the topological sort from the BGL book?
A. Either way will be accepted without penalty.