WORKSHOP CO-CHAIRS:
  • Mohammed J. Zaki

  • Rensselaer Polytechnic Institute
    (zaki.AT.cs.rpi.edu )
  • Vipin Kumar 

  • University of Minnesota 
    (kumar@cs.umn.edu)
  • David Skillicorn

  • Queens University, Canada 
    (skill@cs.queensu.ca

    PROGRAM COMMITTEE:

  • Philip Chan, Florida Institute of Technology 
  • David Cheung, University of Hong Kong, Hong Kong 
  • Alok Choudhary, Northwestern University 
  • Alex A. Freitas, PUC-PR (Pontifical Catholic University of Parana), Brazil 
  • Johannes Gehrke, Cornell University 
  • Robert Grossman, University of Illinois-Chicago 
  • Yike Guo, Imperial College, UK 
  • Howard Ho, IBM Almaden Research Center 
  • Chandrika Kamath, Lawrence Livermore National Lab 
  • Hillol Kargupta, Washington State University 
  • Masaru Kitsuregawa, University of Tokyo, Japan 
  • Bill Maniatty, State University of New York-Albany 
  • Ron Musick, Lawrence Livermore National Lab 
  • Yi Pan, Georgia State University 
  • Srini Parthasarathy, Ohio State University
  • Foster Provost, New York University 
  • Arno Siebes, CWI, Netherlands 
  • Domenico Talia, ISI-CNR, Rende, Italy 
  • Albert Zomaya, University of Western Australia
  • PDDM, 2001

    4th International
    Workshop on Parallel and Distributed Data Mining
    April 27, 2001
    San Francisco, CA, USA

    in conjunction with

    15th International Parallel and Distributed Processing Symposium(IPDPS'2001)

    Workshop History: This is the 4th workshop on this theme held annually in conjunction with the IPDPS  conference. The first three workshops went under the name "High Performance Data Mining," and were held at Orlando ( HPDM'98), San Juan ( HPDM'99) and Cancun (HPDM'00). In keeping with the growing popularity and international scope of this field, this workshop has been renamed "International Workshop on Parallel and Distributed Data Mining". 

    As the volume of data increases, it is clear that both parallel and distributed data mining techniques are required to make the whole knowledge discovery process scalable and interactive. This workshop will target papers on high performance parallel and distributed methods, as well as mining on distributed and heterogeneous datasets. Topics of interest include:

    • Efficient, scalable, disk-based, parallel and distributed algorithms for large-scale data mining tasks.
    • New algorithms for common data mining methods such as

    • association rules, sequences, classification, clustering, deviation detection, etc.
    • Pre-processing and post-processing operations like sampling, feature selection, data reduction and transformation, rule grouping and pruning, etc.
    • Incremental, exploratory and interactive mining
    • Meta-mining, coping with distributed and/or heterogeneous datasets.
    • Integration of mining with parallel/distributed databases and

    • datawarehouses.
    • Mining non-traditional datasets, such as large scientific databases.
    • Frameworks for KDD systems, and parallel or distributed mining.
    • Agent based approaches for PDDM.
    • Applications of PDDM in business, science, engineering, medicine, and other disciplines. 
    • Theoretical foundation of PDDM. 
      WORKSHOP SCHEDULE:

    9:00  - 9:15  Opening Remarks
    9:15  -10:00 Keynote Talk
    10:00-10:30 Coffee Break
    10:30-12:00 Session I
    12:00-13:30 Lunch
    13:30-14:15 Invited Talk
    14:15-15:15 Session II
    15:15-15:20 Concluding Remarks
    15:20-15:30 Coffee Break

    SESSION INFORMATION:

    Keynote Talk: Scalable Parallel Data Mining for High-Dimensional Data, Alok Choudhary, Northwestern University (Speaker Bio)
    Abstract:  Large-scale Data analysis and data mining on warehouses (where huge amount of time-varying observational, transactional or simulation data is stored) pose many challenges. The data stored is typically multidimensional with large number of dimensions. In many cases, the data is highly sparse. Parallel processing techniques have become important to enable the use of larger data sets and reduce the time for analysis and knowledge discovery. In this talk, I will briefly present PARSIMONY, a system which provides an infrastructure as well as scalable algorithms for analysis and mining of large and multidimensional data. In particular, I will present MAFIA, a scalable parallel clustering algorithm for large dimensional data.

    Session I:

  • Efficient Data Mining: Scripting and Scalable Parallel Algorithms, Peter Christen, Markus Hegland, Ole M. Nielsen, Stephen Roberts, Peter Strazdins, Tatiana Semenova, Irfan Altas

  •  
  • An efficient association mining implementation on clusters of SMP, Ruoming Jin and Gagan Agrawal

  •  
  • Implementation and performance evaluation of dynamic scheduling for parallel decision tree generation, Kazuto Kubota, Akihiko Nakase and Shigeru Oyanagi   

  • Invited Talk: Ubiquitous Mining of Distributed Data, Hillol Kargupta, University of Maryland Baltimore County (Speaker Bio)
    Abstract: Knowledge discovery and data mining deal with the problem of extracting interesting associations, classifiers, clusters, and other patterns from data. The emergence of network-based environments has introduced a new important dimension to this problem--distributed sources of data and computing. The advent of laptops, palmtops, handhelds, and wearable computers is making ubiquitous access to large quantity of distributed data a reality. Advanced analysis of distributed data for extracting useful knowledge is the next natural step in the increasingly connected world of ubiquitous computing. However, this will not come for free; it will introduce additional cost due to communication, computation, security among others. Distributed data mining (DDM) offers the capability to analyze distributed data by minimizing this cost to maintain the ubiquitous presence. This talk will explain the Collective Data Mining (CDM) approach to DDM that offers a collection of different scalable distributed data analysis techniques. It will present an overview of the CDM technology and its applications.

    Session II:
     

  • Towards Network-Aware Data Mining, Srinivasan Parthasarathy

  •  
  • Incremental Quantitative Rule Derivation by Multidimensional Data Partitioning, Junping Sun 

  • Maintained by: Mohammed J. Zaki <zaki.AT.cs.rpi.edu>
    You are visitor You are visitor