

Research Ph.D. ThesesMining Approximate Frequent Patterns from Graph Databases
By Pranay Anchuri
In recent times, graph mining algorithms have become an indispensable tool for analyzing networks in domains such as (i) Computational biology (ii) Infrastructure and mobile sectors (iii) Cybersecurity. Many of these realworld graphs have complex labels on the nodes and edges. Mining only exact patterns yields limited insights, since it may be hard to find exact matches. However, in many domains it is relatively easy to compute some cost (or distance) between different labels. Using this information, it becomes possible to mine a much richer set of approximate subgraph patterns. In this thesis, we present methods for mining approximate frequent patterns from graph databases. The approximate pattern mining algorithms are two folded: First, instead of computing the exact support of a pattern we approximate the support by an upperbound. Second, we tolerate bounded label and structural mismatches when finding matches of a pattern in the database. A common theme across both these paradigms is that the problems are NPHard to even approximate up to a constant factor of the optimal value. We formally prove the NPHardness of all these problems and also present approximation algorithms for solving the problems at scale. During the talk, I'll present experimental results of our graph mining algorithms on three realworld datasets: (i) Configuration management databases representing the infrastructure entities and their interrelationships in large IT companies (ii) ProteinProtein interaction network in yeast (iii) Graphs representing 3D structure of proteins. Return to main PhD Theses page 

