Data mining and knowledge discovery in databases: implications for scientific databases

Data mining and knowledge discovery in databases (KDD) promise to play an important role in the way people interact with databases, especially scientific databases where analysis and exploration operations are essential. The author defines the basic notions in data mining and KDD, defines the goals, presents motivation, and gives a high-level definition of the KDD process and how it relates to data mining. The author then focuses on data mining methods. Basic coverage of a sampling of methods is provided to illustrate the methods and how they are used. The author covers a case study of a successful application in science data analysis: the classification of cataloging of a major astronomy sky survey covering 2 billion objects in the northern sky. The system can outperform human as well as classical computational analysis tools in astronomy on the task of recognizing faint stars and galaxies. The author also covers the problem of scaling a clustering problem to a large catalog database of billions of objects.

[1]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[2]  Edward E. Leamer,et al.  Specification Searches: Ad Hoc Inference with Nonexperimental Data , 1980 .

[3]  Kevin T. Kelly,et al.  Discovering Causal Structure. , 1989 .

[4]  William Frawley,et al.  Knowledge Discovery in Databases , 1991 .

[5]  Ali S. Hadi,et al.  Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[6]  Pietro Perona,et al.  Automating the hunt for volcanoes on Venus , 1994, 1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[7]  S. Djorgovski,et al.  The discovery of five quasars at z>4 using the Second Palomar Sky Survey , 1995 .

[8]  S. Djorgovski,et al.  Initial Galaxy Counts from Digitized Poss-II , 1995 .

[9]  Abraham Silberschatz,et al.  On Subjective Measures of Interestingness in Knowledge Discovery , 1995, KDD.

[10]  U. Fayyad Knowledge Discovery and Data Mining: An Overview , 1995 .

[11]  S. Djorgovski,et al.  Automated Star/Galaxy Classification for Digitized Poss-II , 1995 .

[12]  David Haussler,et al.  Mining scientific data , 1996, CACM.

[13]  Jorma Rissanen,et al.  SLIQ: A Fast Scalable Classifier for Data Mining , 1996, EDBT.

[14]  Heikki Mannila,et al.  Fast Discovery of Association Rules , 1996, Advances in Knowledge Discovery and Data Mining.

[15]  Usama M. Fayyad,et al.  Automating the Analysis and Cataloging of Sky Surveys , 1996, Advances in Knowledge Discovery and Data Mining.