|
Bioinformatics is the science of managing, mining, and interpreting information from biological data. Various genome projects have contributed to an exponential growth in DNA and protein sequence databases. Advances in high-throughput technology such as microarrays and mass spectrometry have further created the fields of functional genomics and proteomics, in which one can monitor quantitatively the presence of multiple genes, proteins, metabolites, and compounds in a given biological state. The ongoing influx of these data, the presence of biological answers to data observed despite noise, and the gap between data collection and knowledge curation have collectively created exciting opportunities for data mining researchers.
While tremendous progress has been made over the years, many of the fundamental problems in bioinformatics, such as protein structure prediction, gene-environment interaction, and regulatory pathway mapping, are still open. Beside these, new technologies such as next-generation sequencing are producing massive amount of sequence data; managing, mining and compressing these data raise challenging issues. Data mining will play an essential role in understanding these fundamental problems and development of novel therapeutic/diagnostic solutions in post-genome medicine.
The goal of this workshop is to encourage KDD researchers to take on the numerous challenges that Bioinformatics offers.
This year, the workshop will feature the theme of Data Mining Challenges in Next-generation Sequencing (NGS). NGS is revolutionizing biological, biomedical, and health research. There are enormous data analyses and knowledge discovery challenges in the NGS technology, including expression analysis, mutational analysis, alternative slicing pattern discovery, whole transcription sequence alignment, epigenetics site discovery, storing and compression of high volume sequence data and clustering and classification of structural variations in a population.
|
|
Table of Contents
Return to Top
Algorithm for Low-Variance Biclusters to Identify Coregulation Modules in Sequencing Datasets
Zhen Hu (University of Cincinnati)
Raj Bhatnagar (University of Cincinnati)
Analysis of Obligate and Non-obligate Complexes using Desolvation Energies in Domain-domain Interactions
Mina Maleki (University of Windsor)
Md. Mominul Aziz (University of Windsor)
Luis Rueda (University of Windsor)
Analysis of Obligate and Non-obligate Complexes using Desolvation Energies in Domain-domain Interactions
K.S.M. Tozammel Hossain (Virginia Tech)
Chris Bailey-Kellogg (Dartmouth College)
Alan Friedman (Purdue University)
Michael Bradley (Yale University)
Nathan Baker (Pacific Northwest National Laboratory)
Naren Ramakrishnan (Virginia Tech)
Analyze Influenza Virus Sequences Using Binary Encoding Approach
Hamching Lam (University of Minnesota)
Daniel Boley (University of Minnesota)
A Lung Cancer Outcome Calculator Using Ensemble Data Mining on SEER Data
Ankit Agrawal (Northwestern University)
Sanchit Misra (Northwestern University)
Ramanathan Narayanan (Northwestern University)
Lalith Polepeddi (Northwestern University)
Alok Choudhary (Northwestern University)
|