| This homework assignment is due on Friday, October
12 at 11:59:59 pm, and it will be graded.  (Counts about 10% of the
course grade.)  On this homework, no collaboration is allowed;
all work must be your own.
 Late submissions: Assignments turned in up to 24 hours after
the due date will receive a 10% penalty.  After that, late
assignments will receive a 20% penalty up until the time solutions
are posted or discussed in class, after which they cannot be accepted
at all.  A solution to this assignment will be posted at 6pm on
Thursday, October 18.  Students should not discuss their solutions or
attempts at a solution with any other student before that time, even
if they have already submitted their work, and even if they have
already received a grade on it.
 See also the expanded submission instructions,
additional notes, and frequently asked questions 
at the end of this section.
 A topological sort of a directed acyclic graph G=(V,E) is a linear
ordering of all its vertices such that if G contains an edge (u,v)
then u appears before v in the ordering.
Note that a given graph generally has more than one topological ordering.
 Chapter 3 in the BGL tutorial describes a generic topological sort
algorithm topological_sort that is based on supplying an
appropriate visitor to depth first search. The documentation of this
algorithm is given in section 13.2.5 of the text. However, for this
assignment we shall be using a different approach.
 Another way to perform topological sorting on a directed acyclic graph
G=(V,E) is to repeatedly find a vertex of indegree 0, output it, and
remove it and all of its outgoing edges from the graph. This can be
implemented as an O(|V| + |E|) algorithm (same as the BGL algorithm in
section 13.2.5), by keeping proper track of vertices with indegree 0; i.e, 
at any point it is needed you must be able to find such a vertex in constant time.
There are three
principal functions to be developed: Test your programs as follows:Implement the above approach (in C++ using BGL), as a generic
algorithm with basically the same interface as the BGL algorithm as
described in section 13.2.5. Note that that interface does not assume
bidirectionality of the graph representation, so your algorithm will
have to initially compute and keep track of indegrees.  Your algorithm
should have an  O(|V| + |E|) time bound.
Note: There is an error in the specification in Section
13.2.5, in saying that the input can be "a directed or undirected
graph." It must be directed. 
Specify and implement an topological sort acceptance testing function.
This is a boolean valued function that accepts an acyclic directed
graph and a linear sequence of vertices as inputs, and returns true if
and only if the sequence is a topological ordering of the graph.  (This
function will be needed in order to check the result of running your
topological sort algorithm on large graphs, as required in the second
part of the assignment, since for these graphs it will be too
difficult to check the result by hand.)
Note that you cannot implement this function by calling another
(trusted) topological sorting algorithm and comparing its result with
the given ordering, since the other algorithm may produce a correct
but different ordering.
 In general, an acceptance testing function should be written in as
simple a way as possible, so that its own correctness is easily verified
by inspection of the code.  But if it is to be applied to large
data sets you do have to be careful that it doesn't take more time
than the algorithm it is being used to check! (Thus in this case, it
should have an O(|V| + |E|) bound, whereas an "obvious" method
might take O(|V|2 + |E|).) 
Specify and implement a function that generates random
acyclic graphs of a given size and denseness.  This function will be
needed in order to test your topological sort algorithm on large
graphs.  You should be able to control the denseness of graph by
varying the probability of edge creation in the graph. For example a
probability of 0.7 would mean that, out of all the 10 possible edges in
the graph, 7 of the edges are actually created, on average.
	
One problem in creating random graphs is that cycles may appear in the
graph. To make sure you create random acyclic graphs, use the
following method:
	 	 Let the vertices be numbered from 1 to n.
	Whenever adding an edge to vertex k, always choose one
	 of the vertices numbered between k+1 to n. The choice of
	 vertices in this range is done randomly. This will ensure
	 that there is no edge going back to any of the parent vertices
	 in the graph.
	
 Although this method eliminates some randomness in the graph creation,
it still is a reasonable way for purposes of algorithm testing to
create large acyclic directed graphs. 
    Submit all of your code including the program
or programs that set up the test cases. Add a README file that briefly
describes the programs. It should also describe the results of the
testing, explicitly mentioning any case in which one of your programs
does not work as specified or does not finish or takes an excessive
amount of time.  Put all the files in a directory and zip the directory.
The directory should be named the same as your rcs-id.  E.g., my
rcs-id is mussed (different from my CS lab id), so I would put my
files in a directory named mussed and execute (from the directory
level just above it)Start testing your algorithm with small graphs for which
the result can be hand-checked.  Chapter 3 (File Dependency graph) and
section 13.2.5 each have such an example. 
Once you get the above part running you can go ahead and test
your algorithm on large random graphs. In such tests you will use your
random graph generator to generate a graph, apply your algorithm to
it, and check the correctness of the result with your acceptance
testing function. Among the tests you do, include one with a graph
with 1,000 vertices and about 5,000 edges (edge probability 0.01).
(Originally the assignment said 0.05, but as as noted in class on
10/5, this was incorrect. A complete DAG with N vertices has
N(N-1)/2 edges, so for N = 1,000, about 500,000 edges.)
	 
  zip -r mussed mussed
This would create a zip file named mussed.zip.
Mail the zip file to Arvind (venkaa@cs.rpi.edu) on or before the due date.
(Note that zip is available on campus Unix systems as well as 
Windows boxes.) See also the additional submission
requirements posted by Arvind on his Web
homepage. (Check them again 
for any updates just before you submit.)
  Data files for one of the suggested small example graphs are
available in the 9/21/2001 news article.  As I
mentioned in class, the graph created by reading in these files is
slightly different from the graph in Figure 3.1 in the BGL book.  The
reason it is different is just that when I assigned numbers to the
vertices in the file, I used the numbers 1 through 15, whereas I
should have used 0 through 14 to make them convenient for use as
vector indices.  So I added a new vertex 0 and gave it the name
foo.cpp.  This choice of name was a slight mistake too, since there
was already a name foo.cpp in the data.  This causes no harm in terms
of the way the algorithm works (it is working on the distinct vertex
numbers 0 through 15), but makes the output look a little strange.  I
have since changed the name of vertex 0 in the online file. 
For certain operations a hash table is the data structure of
choice, since it offers (average case) constant time insertion and
lookup.  For example, checking equality between two sets of values of
an arbitrary type (not just integer values) can be done in linear
time.  If you want to use a hash table for such purposes, use the
hash_set or hash_map containers in the GNU C++ library (which are
adapted from SGI STL).  Unfortunately the C++ standard does not
require such hash tables to be included in the standard library, so
your code that uses them may not be portable.  If you wish to use
another compiler or maintain portability to other compilers, you may
use the standard set or map container instead.  This will likely
introduce an extra log factor in the computing times of your
algorithm; e.g., your acceptance testing function could become an
O(|V| log|V| + |E|) algorithm instead of meeting the stated
requirement, O(|V| + |E|).  This will be permitted without
penalty.  You should not program your own hash table.
 Q. 
I'm working on the homework and I've run into a snag
if I define my graph to be as follows:
typedef adjacency_list<listS, vecS, directedS> Graph
Graph G
then call:
in_degree(*vi,G) 
where vi is a vertex_iterator I get a compilation error stating that
Graph doesn't have a member m_in_edges.  I read further in the BGL
book in section 12.1.4 and it says that only BidirectionalGraph
supports in_degree which conflicts with what is written in the
assignment where it says the graph needs to be directed.
 A. Remember, the assignment says "Note that that interface does not
assume bidirectionality of the graph representation, so your algorithm
will have to initially compute and keep track of indegrees."  If the
graph were bidirectional you could get indegrees almost for free just
from the iterators for incoming edges.  Instead you'll have to compute
them initially by traversing the graph and counting how many edges
have a vertex as its target. Then you need to maintain the indegrees
as the algorithm proceeds. 
Q. 
I have question about homework 2.  I've just implemented my idea
and tried to run against the graph on page 206 of the BGL book.
But my program couldn't process the graph correctly.
The graph seems to have a cycle.  A cycle detection function
introduced in page 54 of the book also says the graph is cyclic.
Should my program process it correctly?
 A. Get rid of the (f,f) edge.  (Apparently the topological sort
algorithm in the BGL works even if you have such a self-edge, but
the new algorithm you implement for this assignment doesn't have to.) 
Q. I nearly finished my source code, but couldn't figure out
how to compile it on VC++ or is VC++ ok for this program?
  A. As I've said in class, you might be able to compile BGL programs
with VC++ -- or you might not, since VC++ doesn't support the C++
standard very well.  If you don't want to struggle with whatever
problems come up, you should use g++.  You can do this on the CSLab
Sun workstations or on your Windows box if you download and install
the Cygnus tools (which you really should have anyway since otherwise
a Windows box is only marginally useful for CS purposes).  See the
course Web page articles on compiling programs with the boost lib
using g++. 
Q.  
For checking if the topological order that i generate is correct or not,
is the following a sufficient condition to check...
 1. For all the vertices, store their indegrees in a vector...O(V)
 2. For each vertex in the topologial order generated, check if the vertex
has indegree 0 and for all the outedges decrement the indegrees of the
target vertex in the vector generated in step 1. ...O(V+E).
 For the ordering to be correct, the vertex being checked should have
value 0 in the indegree vector.
Am i missing any other condition that i should check ?
 A. Sounds like you are trying to do the check by programming the same
algorithm.  This won't work; if you have a mistake in your original
algorithm you likely have the same mistake in the version you use for
checking.  To write the checking function, you should concentrate on
testing what the definition of a topological ordering says: given a
DAG G and a linear sequence S, check that for every (u, v) that is
an edge in G, u appears before v in S.  Also check that all vertices
in G, and only those vertices, appear in S. 
Q. Since we are writing a generic random graph generator and
the node of the graph can be any type (therefore generic), how can our
program know how to detect any difference between those nodes?
 A. It isn't required that your random DAG generator be a
generic algorithm.  It's okay if it only produces graphs with integers
0, 1, 2, ..., N-1 as vertex descriptors. 
Q. If our DAG generator isn't generic and we produce graphs with integers
as vertex descriptors, doesn't that mean that our sort function only 
needs to accept int type? Why even bother with generic topo sort?
Can we write our sort function given the knowledge that the vertex
descriptors are integers?
 Because I don't see how we gonna write our sort function to be generic
without using a std::pair for the vertex.
 A. No, the purpose of testing with large random graphs is to mainly to
make sure that your algorithm is reasonably efficient; it is not
intended to test for other things such as that your algorithm is
generic.  The requirement that the algorithm is generic still holds.
Thus you should not assume that the graph's vertex descriptors
are integers.  Nor should you assume that they are std::pairs.  If in
your algorithm you need to use vertex descriptors as unique ids,
try using a hash_map or map with the vertex descriptor type as the
key type. 
Q. For homework 2, when designing the randomly generated
graphs, should they be fully connected?  I thought we should do so,
then decided maybe it's not needed.  Which should we do?  Full
connections, or allow multiple chunks of graphs? 
A. Fully-connected graphs are not required.  
Q. I have tried to get the program in section 13.2.5 to
work, and just can't get it to compile.  I know that it has to do with
the actual call to topological_sort that I am making, but even after
looking at the boost code that you have shared, I can't figure out
what is wrong.  I'm sure that I'm totally missing something important,
but if I can't even get the example going I have no hope for the
homework.  I have included the source code that I am using.
 A. It may be that your problem is just that you need to increase
template instantiation depth.  I tried compiling your program with
-ftemplate-depth-30 on the command line, and it compiled and ran
properly. [I shortly afterward got a reply that said it did solve the
problem.] 
Q. In part 1, do we need error checking that tests whether the
graph is acyclic, or is it just assumed that only acyclic graphs will
be passed in?
A. You can assume that acyclic graphs will be passed in. 
Q. I'm finishing up the random graph generator (which
is actually pretty easy), and I'm pretty sure that the formula given
on the website for maximum number of edges on a DAG with N vertices
should be N(N - 1)/2, not N(N + 1)/2.  With the latter formula,
self-edges are included.  A quick example is a graph with 1 vertex
must have 0 edges, but by the formula, 1(1 + 1)/2 = 1.  Is this
correct?
A. Yes, it should be N(N-1)/2 for the maximum number of edges on a DAG.
This has been corrected in the online problem statement.
Q. Also, does the topological sorting function have to
work if listS is selected for the vertex list,
i.e., adjacency_list<vecS, listS, directedS>?  It seems that my
algorithm would be quicker with vecS for the vertex list, because I am
not doing any insertions or removals.  Although I'm not sure if it'd
still be considered generic if it didn't accept listS.  What do you
think?
A. The topological sort should work even if you have listS for
the vertex list.  How do you know?  Because the specification of the
BGL topological sort in Section 13.2.5 says that "the graph type must
be a model of VertexListGraph and IncidenceGraph."  If you
look at the requirements of these concepts (see p. 159), you see that
listS for the vertex list satisfies those requirements.
Q. I just realized that my sorting function gives the
topological sort in its normal order (not reverse).  Should I reverse
it inside the sorting function to keep it exactly the same as the
topological sort from the BGL book?
A. Either way will be accepted without penalty.
 |