KDD 2011 Workshop CD-Proceedings

10th International Workshop on Data Mining in Bioinformatics (BIOKDD 2011)

Bioinformatics is the science of managing, mining, and interpreting information from biological data. Various genome projects have contributed to an exponential growth in DNA and protein sequence databases. Advances in high-throughput technology such as microarrays and mass spectrometry have further created the fields of functional genomics and proteomics, in which one can monitor quantitatively the presence of multiple genes, proteins, metabolites, and compounds in a given biological state. The ongoing influx of these data, the presence of biological answers to data observed despite noise, and the gap between data collection and knowledge curation have collectively created exciting opportunities for data mining researchers.

While tremendous progress has been made over the years, many of the fundamental problems in bioinformatics, such as protein structure prediction, gene-environment interaction, and regulatory pathway mapping, are still open. Beside these, new technologies such as next-generation sequencing are producing massive amount of sequence data; managing, mining and compressing these data raise challenging issues. Data mining will play an essential role in understanding these fundamental problems and development of novel therapeutic/diagnostic solutions in post-genome medicine.

The goal of this workshop is to encourage KDD researchers to take on the numerous challenges that Bioinformatics offers. This year, the workshop will feature the theme of Data Mining Challenges in Next-generation Sequencing (NGS). NGS is revolutionizing biological, biomedical, and health research. There are enormous data analyses and knowledge discovery challenges in the NGS technology, including expression analysis, mutational analysis, alternative slicing pattern discovery, whole transcription sequence alignment, epigenetics site discovery, storing and compression of high volume sequence data and clustering and classification of structural variations in a population.

Schedule Return to Top

Workshop Schedule at a Glance
	August 21, 2011 Sunday
8:25-9:25	Opening Remarks
	Invited Speaker presentation 1
9:30-10:10	Algorithm for Low-Variance Biclusters to Identify Coregulation Modules in Sequencing Datasets Zhen Hu and Raj Bhatnagar Analysis of Obligate and Non-obligate Complexes using Desolvation Energies in Domain-domain Interactions Mina Maleki, Md. Mominul Aziz and Luis Rueda
10:10-10:30	Coffee break
10:30-11:25	Invited Speaker presentation 2
11:30-12:45	Using Physicochemical Properties of Amino Acids to induce Graphical Models of Residue Couplings K.S.M. Tozammel Hossain, Chris Bailey-Kellogg, Alan Friedman, Michael Bradley, Nathan Baker and Naren Ramakrishnan Analyze Influenza Virus Sequences Using Binary Encoding Approach Hamching Lam and Daniel Boley A Lung Cancer Outcome Calculator Using Ensemble Data Mining on SEER Data Ankit Agrawal, Sanchit Misra, Ramanathan Narayanan, Lalith Polepeddi and Alok Choudhary
	Closing Remarks

Invited Speakers Return to Top

Dr. Vineet Bafna, Professor, University of California, San Diego

Dr. Harry Gao, Director, DNA Sequencing/Solexa Core Lab, City of Hope

Table of Contents Return to Top

Algorithm for Low-Variance Biclusters to Identify Coregulation Modules in Sequencing Datasets
Zhen Hu (University of Cincinnati)
Raj Bhatnagar (University of Cincinnati)

Analysis of Obligate and Non-obligate Complexes using Desolvation Energies in Domain-domain Interactions
Mina Maleki (University of Windsor)
Md. Mominul Aziz (University of Windsor)
Luis Rueda (University of Windsor)

Analysis of Obligate and Non-obligate Complexes using Desolvation Energies in Domain-domain Interactions
K.S.M. Tozammel Hossain (Virginia Tech)
Chris Bailey-Kellogg (Dartmouth College)
Alan Friedman (Purdue University)
Michael Bradley (Yale University)
Nathan Baker (Pacific Northwest National Laboratory)
Naren Ramakrishnan (Virginia Tech)

Analyze Influenza Virus Sequences Using Binary Encoding Approach
Hamching Lam (University of Minnesota)
Daniel Boley (University of Minnesota)

A Lung Cancer Outcome Calculator Using Ensemble Data Mining on SEER Data
Ankit Agrawal (Northwestern University)
Sanchit Misra (Northwestern University)
Ramanathan Narayanan (Northwestern University)
Lalith Polepeddi (Northwestern University)
Alok Choudhary (Northwestern University)

Organizers Return to Top

Mohammad Al Hasan, Indiana University--Purdue University, Indianapolis
Jun (Luke) Huan, University of Kansas
Jake Y Chen, Indiana University--Purdue University, Indianapolis
Mohammed J Zaki, Rensselaer Polytechnic Institute