Skip to main content

Homework 3: Graph Visualization

NOTE: Part 2 of this homework can be done individually, or as a team of two. Regardless of whether you work with a partner for part 2, parts 1 & 3 should be done on your own.

Part 1: Familiarize yourself with Graphviz

Let's experiment with Graphviz, an open source visualization tool for automatically drawing network-like graphs of linked nodes. The papers we read for Lecture 3 describe algorithms from the core technology of Graphviz:

To get started:

  • Download the Graphviz software and explore the gallery of examples. Learn how to run these examples on your own machine and view the results.

  • Edit by hand some of the input graph files -- see also the the DOT file format documentation. Experiment with different display options.

  • Using your favorite programming language, write code to generate and output a range of synthetic input files. Your program should be parameterized (allow you to modify) the number of nodes, the density/sparseness of edges, etc. Consider using randomness. Don't over-design your program. The goal is to allow you to prototype and quickly experiment with a variety of graphs. You can tweak and add more to your program as you go.

    Alternatively, you could write in C/C++ and directly compile with the Graphviz codebase.

  • Using your program, create a variety of network connectivities including: a tree, a clique, a planar graph, a bipartite graph, disconnected components, etc. The size of your graphs should be "medium" sized (10-40 nodes?). The size should not be too large -- that is, with a little time and modest drawing skills, a good, "approximately-optimal" layout could be done by hand.

    Try the different options in GraphViz to draw each of these graphs. Which options work best for different types of graphs? Save 5-10 images with your most successful results for inclusion in your homework writeup to demonstrate the breadth of graph variety your experimentation.

Part 2: Graph Drawing Application - RPI Course & Instructor Data

Most/all of the students registered for CSCI 4550 are CSCI undergraduate majors and thus hopefully very familiar with the degree requirements for the RPI CSCI major. If you are are not a CSCI major, or if you are a graduate student who got your CSCI undergraduate degree at another school, you are encouraged to team up with an RPI CSCI major for part 2. Everyone is encouraged to work with a partner for part 2 -- and to work with a different partner than they worked with for the previous homework. You may work alone for this assignment.

The RPI CSCI department undergraduate curriculum committee publishes course planning templates by semester they entered RPI. For example, here is the template for students who entered in Fall 2022 and will graduate in May 2026. The university's standardized way for presenting degree requirements is a grid with 8 semesters and 16 credits = 4 classes per term:

This presentation is straightforward and it provides clear instructions for the generic student who enters RPI with no prior credits and follows the template exactly. However, if the student does have AP credit, or if they are a dual major, or if they end up dropping a course, or if choose to take a humanaties elective that conflicts with a required course this chart doesn't spell out what adjustments are possible.

Here is an alternate visualization prepared by Prof David Goldschmidt showing the prerequisites within the required CSCI-prefix courses typically taken in the first four semesters.

This visualization explains why students should not delay taking certain courses, e.g. CSCI 2200 FOCS, to prevent a prerequisite bottleneck that could delay their graduation. However, this visualization only covers the first two years and only shows the prerequistes within CSCI courses. Furthermore, it does not include information about what terms each courses is offered. For example, CSCI 2600 has not been offered in the fall semester since 2019.

We will use the dataset collected for QuACS - the Questionably Accurate Course Scheduler. The dataset is available on GitHub: https://github.com/quacs/quacs-data and was scraped by RPI CSCI alum Eli Schiff, and maintained by the QuACS team. The data is formatted using the JSON file format, a popular and relatively simple file structure with easy-to-use parsers for Parsers for Python and C++ and most modern languages.

Your task is to design a few new and improved graph visualizations and execute them using Graphviz:

  1. A directed graph showing a typical or specific student progressing through the CSCI degree requirements over ~8 semesters. You can draw the exact course plan you followed, what you should have taken (if you had a time machine), or the course plan you would recommend to a friend who is just starting their studies at RPI.

    Note: You can include the complexity of "the Summer Arch" or ignore it altogether. You can choose how to handle non CSCI courses: what to include, what to display, what to omit or simplify.

    Carefully consider your audience for this visualization -- students and their advisors who are making course planning and registration decisions. Does your visualization help them understand what modifications are possible? Which classes can be taken early or delayed? Maybe you can indicate which courses and terms are more challenging -- workload, time commitment, etc.

  2. A bi-partite graph showing two types of nodes: CSCI courses and CSCI instructors where an edge is drawn between a course and an instructor if the instructor has taught the course at least once. There are no edges between instructors, or between courses. This graph should reveal how faculty and courses cluster by teaching/research interests and specializations.

    The QuACS data is large (it includes data from the last 25 years!). You should simplify the data to make sure the visualization is legible. E.g., only include faculty who have taught at least one CSCI course in the last year, only include CSCI classes that have a minimum of 20 students, etc.

  3. For extra credit you can design and execute an additional interesting graph of your choice from this data (or subset of this data).

Experiment with the visual options (layout, color, line style, line thickness, line shape, text placement, font, etc.). What options are most successful for the different visualizations? Write an excellent caption for each of your polished graphs.

Part 3: Discussion

Analyze your experience with the Graphviz tool. Were you able to successfully execute your designs? Did the tool help you automatically create interesting, moderately-complex graphs? Note any weaknesses in the tool or degeneracies with more specific inputs. Could you have done better by hand? What adjustments/improvements do you believe are necessary to the graph drawing engine of this tool?

Write a short review of the tool. How quick was the installation and learning to use the tool? What resources were most useful? What sorts of applications/datasets are most appropriate for this tool? What are the limitations of this tool? What are some suggestions/cautions to others who consider using this tool?

How to Submit

Regardless of whether you worked with a partner for part 2, each student should individually prepare their own .pdf report with embedded images. Be sure to include:

  • ~5 graphs of synthetic data demonstrating a variety of graph types and visualization styling. Write a short caption for each describing what you were trying to achieve, what was successful, and/or what needs improvement.

  • Your source code for part 1 that you used to generate synthetic input data. Note: We won't try to run your code when grading, so don't include any 3rd party libraries you may have used or the executable. The code should have some organization and basic comments, but you do not need to prepare detailed documentation.

  • 3 polished graphs of the course data with carefully written captions. Be sure to point out interesting observations you discovered about the course and instructor data that the viewer can see in your visualizations.

  • The name of your partner -- if you worked with a partner for part 2.

  • Your source code for part 2 to process the QuACs data and prepare the input for your course and instructor graph visualizations.

  • Your review and discussion of the Graphviz tool.

Finally, share your favorite, most-successful image from the assignment on the Submitty Forum. Make sure your forum post includes a well-written and descriptive caption to go along with your image.