* Faculty       * Staff       * Contact       * Institute Directory
* Research Groups      
* Undergraduate       * Graduate       * Institute Admissions: Undergraduate | Graduate      
* Events       * Institute Events      
* Lab Manual       * Institute Computing      
No Menu Selected

* Research

Ph.D. Theses

Mining Subspace and Boolean Patterns from Data

By Lizhuang Zhao
Advisor: Mohammed J. Zaki
September 27, 2006

We set up a systematic framework for mining subspace and boolean patterns from data. Subspace patterns are extracted from real-valued datasets, whereas Boolean patterns are mined from binary-valued datasets. For real-valued datasets, we mainly consider scaling and shifting relationships, whereas for binary-valued datasets, we mine arbitrary boolean expressions (OR, AND, CNF, DNF). Boolean patterns are also used to mine redescriptions, i.e., to describe the same group of objects in multiple ~Sorthogonal ways~T.

The subspace pattern mining has been tailored to gene microarray data clustering to find biclusters and triclusters. We present two novel deterministic clustering algorithms: MICROCLUSTER and TRICLUSTER (the first 3D microarray subspace clustering), which can mine arbitrarily positioned and overlapping biclusters/triclusters. Depending on different parameter values, they can mine different types of clusters, including those with constant or similar row/column values, as well as scaling and shifting expression patterns. Optionally, the two algorithms merge/prune some clusters having large overlaps. We also give a useful set of metrics to evaluate the clustering quality, and show their effectiveness on real microarray expression data.

We also present a comprehensive framework, BLOSOM, for mining boolean patterns from binary-valued datasets. This framework forms the algorithmic core of a redescription mining solution. We organize the space of boolean expressions into four categories; pure conjunctions, pure disjunctions, conjunction of disjunctions, and disjunction of conjunctions. For each category, we propose a closure operator that naturally leads to the concept of a closed boolean expression. Further, the closed generators form a lossless representation of all possible boolean expressions and, hence, all possible redescriptions. BLOSOM efficiently mines several forms of frequent boolean expressions by utilizing a number of methodical pruning techniques.

* Return to main PhD Theses page