Data mining and KDD: Promise and challenges

Abstract Databases are growing in size to a stage where traditional techniques for analysis and visualization of the data are breaking down. Data mining and knowledge discovery in databases (KDD) are concerned with extracting models and patterns of interest from large databases. Data mining techniques have their origins in methods from statistics, pattern recognition, databases, artificial intelligence, high performance and parallel computing, and visualization. In this article, we provide an overview of this growing multi-disciplinary research area, outline the basic techniques, and provide brief coverage of how they are used in some applications. We discuss the role of high performance and parallel computing in data mining problems, and we provide a brief overview of a few applications in science data analysis. We conclude by listing challenges and opportunites for future research.

[1]  Christopher Dean,et al.  Quakefinder: A Scalable Data Mining System for Detecting Earthquakes from Space , 1996, KDD.

[2]  S. Djorgovski,et al.  Initial Galaxy Counts from Digitized Poss-II , 1995 .

[3]  Hisashi Nakamura,et al.  Fast Spatio-Temporal Data Mining of Large Geophysical Datasets , 1995, KDD.

[4]  Peter F. Stadler,et al.  Knowledge Discovery in RNA Sequence Families of HIV Using Scalable Computers , 1996, KDD.

[5]  John W. Tukey,et al.  Exploratory Data Analysis. , 1979 .

[6]  Abraham Silberschatz,et al.  On Subjective Measures of Interestingness in Knowledge Discovery , 1995, KDD.

[7]  Jorma Rissanen,et al.  SLIQ: A Fast Scalable Classifier for Data Mining , 1996, EDBT.

[8]  Pietro Perona,et al.  Automating the hunt for volcanoes on Venus , 1994, 1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Edward E. Leamer,et al.  Specification Searches: Ad Hoc Inference with Nonexperimental Data , 1980 .

[10]  Kevin T. Kelly,et al.  Discovering Causal Structure. , 1989 .

[11]  S. Djorgovski,et al.  The discovery of five quasars at z>4 using the Second Palomar Sky Survey , 1995 .

[12]  William Frawley,et al.  Knowledge Discovery in Databases , 1991 .

[13]  John K. Salmon,et al.  Parallel Halo Finding in N-Body Cosmology Simulations , 1996, KDD.

[14]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[15]  S. Djorgovski,et al.  Automated Star/Galaxy Classification for Digitized Poss-II , 1995 .