Workshop on Large-Scale Parallel KDD Systems
August 15th, 1999, San Diego, CA, USA
in conjunction with
ACM SIGKDD International Conference on
Knowledge Discovery and Data Mining (KDD99)
Online
Proceedings on ACM Server
With the unprecedented rate at which data is being collected today
in almost all fields of human endeavor, there is an emerging economic and
scientific need to extract useful information from it. Many companies already
have data warehouses in the terabyte range (e.g., FedEx, UPS, Walmart,
etc.). Implementation of data mining ideas in high-performance parallel
and distributed computing environments is thus crucial for ensuring system
scalability and interactivity.
The goal of this workshop is to bring researchers and practitioners
together in a setting where they can discuss the design, implementation,
and deployment of large-scale, parallel knowledge discovery (PKD)
systems, which can manipulate data taken from very large enterprise or
scientific (e.g., space missions, human genome project, etc.)
databases, regardless of whether the data are located centrally or
are globally distributed. Relevant topics for the workshop
include:
-
How to develop a rapid-response, scalable, and parallel knowledge
discovery system that supports global organizations with terabytes of
data.
-
How to address some of the challenges facing current state-of-the-art data
mining tools. These challenges include relieving the user from time
and volume constrained tool-sets, evolving knowledge stores with new
knowledge effectively, acquiring data elements from
heterogeneous sources such as
the Web or other repositories, and enhancing the PKD process by incrementally
updating the knowledge stores.
-
How to leverage high performance parallel and distributed techniques
in all the phases of KDD, such as initial data selection, cleaning and
preprocessing, transformation, data-mining task and algorithm
selection and its application, pattern evaluation, management of discovered
knowledge, and providing tight coupling between the mining engine and
database/file server.
-
How to facilitate user interaction and usability, allowing the
representation of domain knowledge, and to maximize understanding
during and after the process. That is, how to build an adaptable
PKD engine which supports business decisions, product creation
and evolution, and leverages information into usable or actionable
knowledge.
Online Proceedings
- 8:40am, Opening
- 8:45-9:25am,
Invited Talk: Collection-Based Data Management
Reagan Moore, San Diego Supercomputer Center, USA
- 9:25-10:15am, Session I: Mining Frameworks
-
A high performance implementation of the data space
transfer protocol (DSTP)
- S. Bailey, E. Creel, R. Grossman, S. Gutti, H. Sivakumar ,
University of Illinois-Chicago, USA
-
Active data mining in a distributed setting
- S. Parthasarathy, S. Dwarkadas, M. Ogihara,
University of Rochester, USA
- 10:15-10:30am, Coffee Break
- 10:30-11:10am,
Invited Talk: Large-Scale Data Mining Applications:
Requirements and Architectures
Umeshwar Dayal, Hewlitt-Packard Corp., USA
- 11:10-12:00am,
Session II: Association Rules
- 12:00-2:00pm, Lunch Break
- 2:00-2:40pm,
Invited Talk:
Integrated Delivery of Large-Scale Data Mining
Systems
Graham Williams, CSIRO, Australia
- 2:40-3:30pm, Session III: Clustering and Sequences
- 3:30-3:45pm, Coffee break
- 3:45-4:25pm,
Invited Talk: Communicating Data Mining:
Issues and Challenges in Wide Area Distributed Data Mining
Bob Grossman and Yike Guo,
University of Illinois-Chicago, USA and Imperial College, UK
- 4:25-5:15pm, Session IV: Classification
- 5:15-6:15pm,
Panel: Large-Scale Data Mining: Where is it Headed?
- Vipin Kumar, University of Minnesota
- Ron Musick, Lawrence Livermore National Labs
- Foster Provost, Bell Atlantic
- Mohammed Zaki (moderator), Rensselaer Polytechnic Institute
- 6:15pm, Closing
Registration:
All registrants to the SIGKDD conference are eligible to participate
in the workshop.
There is no separate registration fee
for the workshop, but the workshop attendance is
by invitation only, and the number of partcipants in the
workshop will be limited to 60.
To register for the workshop please send an email to one of the
workshop chairs, expressing your interest in the workshop. A brief
statement of your research/work interests will be helpful. Of course,
you must also register for the main SIGKDD conference.
A list of the registrants so far is available
List of Registrants .
Dr. Mohammed J. Zaki
Computer Science Department
Rensselaer Polytechnic Institute
Troy NY 12180
zaki.AT.cs.rpi.edu
Dr. Ching-Tien (Howard) Ho
IBM Almaden Research Center
650 Harry Road
San Jose CA 95120
ho@almaden.ibm.com
Program Committee:
David Cheung, University of Hong Kong, Hong Kong
Alok Choudhary, Northwestern University
Alex A. Freitas, PUC-PR, Brazil
Robert Grossman, University of Illinois-Chicago
Yike Guo, Imperial College, UK
Hillol Kargupta, Washington State University
Masaru Kitsuregawa, University of Tokyo, Japan
Vipin Kumar, University of Minnesota
Reagan Moore, San Diego Supercomputer Center
Ron Musick, Lawrence Livermore National Lab
Srini Parthasarathy, University of Rochester
Sanjay Ranka, University of Florida
Arno Siebes, CWI, Netherlands
David Skillicorn, Queens University, Canada
Paul Stolorz, Jet Propulsion Lab
Graham Williams, CSIRO, Australia
Number of Visitors