Ph.D. Theses

Mining Approximate Frequent Patterns from Graph Databases

By Pranay Anchuri
Advisor: Mohammed Zaki
May 4, 2015

In recent times, graph mining algorithms have become an indispensable tool for analyzing networks in domains such as (i) Computational biology (ii) Infrastructure and mobile sectors (iii) Cybersecurity. Many of these real-world graphs have complex labels on the nodes and edges. Mining only exact patterns yields limited insights, since it may be hard to find exact matches. However, in many domains it is relatively easy to compute some cost (or distance) between different labels. Using this information, it becomes possible to mine a much richer set of approximate subgraph patterns.

In this thesis, we present methods for mining approximate frequent patterns from graph databases. The approximate pattern mining algorithms are two folded: First, instead of computing the exact support of a pattern we approximate the support by an upper-bound. Second, we tolerate bounded label and structural mismatches when finding matches of a pattern in the database. A common theme across both these paradigms is that the problems are NP-Hard to even approximate up to a constant factor of the optimal value. We formally prove the NP-Hardness of all these problems and also present approximation algorithms for solving the problems at scale.

During the talk, I'll present experimental results of our graph mining algorithms on three real-world datasets: (i) Configuration management databases representing the infrastructure entities and their inter-relationships in large IT companies (ii) Protein-Protein interaction network in yeast (iii) Graphs representing 3D structure of proteins.

Return to main PhD Theses page