From Digitized Images to Online Catalogs: Data Mining a Sky Survey

The value of scientific digital-image libraries seldom lies in the pixels of images. For large collections of images, such as those resulting from astronomy sky surveys, the typical useful product is an online database cataloging entries of interest. We focus on the automation of the cataloging effort of a major sky survey and the availability of digital libraries in general. The SKICAT system automates the reduction and analysis of the three terabytes worth of images, expected to contain on the order of 2 billion sky objects. For the primary scientific analysis of these data, it is necessary to detect, measure, and classify every sky object. SKICAT integrates techniques for image processing, classification learning, database management, and visualization. The learning algorithms are trained to classify the detected objects and can classify objects too faint for visual classification with an accuracy level exceeding 90 percent. This accuracy level increases the number of classified objects in the final catalog threefold relative to the best results from digitized photographic sky surveys to date. Hence, learning algorithms played a powerful and enabling role and solved a difficult, scientifically significant problem, enabling the consistent, accurate classification and the ease of access and analysis of an otherwise unfathomable data set.

[1]  B. M. Bennett,et al.  Tables for Testing Significance In a 2 × 3 Contingency Table , 1963 .

[2]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[3]  J. Tyson,et al.  Focas: faint object classification and analysis system. , 1981 .

[4]  Francisco Valdes,et al.  Resolution Classifier , 1982, Astronomical Telescopes and Instrumentation.

[5]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[6]  M. Lings,et al.  Articles , 1967, Soil Science Society of America Journal.

[7]  Matthew Self,et al.  Bayesian Classification , 1988, AAAI.

[8]  Belur V. Dasarathy,et al.  Nearest neighbor (NN) norms: NN pattern classification techniques , 1991 .

[9]  M. Turk,et al.  Eigenfaces for Recognition , 1991, Journal of Cognitive Neuroscience.

[10]  I. Reid,et al.  The Second Palomar Sky Survey , 1991 .

[11]  Usama M. Fayyad,et al.  The Attribute Selection Problem in Decision Tree Generation , 1992, AAAI.

[12]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[13]  S. Odewahn,et al.  Automated star/galaxy discrimination with neural networks , 1992 .

[14]  U. Fayyad On the induction of decision trees for multiple concept learning , 1991 .

[15]  Usama M. Fayyad,et al.  Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning , 1993, IJCAI.

[16]  Usama M. Fayyad,et al.  Branching on Attribute Values in Decision Tree Generation , 1994, AAAI.

[17]  Pietro Perona,et al.  Automating the hunt for volcanoes on Venus , 1994, 1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[18]  S. Djorgovski,et al.  Cataloging the Northern Sky Using a new Generation of Software Technology , 1994 .

[19]  Automated analysis of the digitized second palomar sky survey: system design, implementation, and initial results , 1995 .

[20]  S. Djorgovski,et al.  The discovery of five quasars at z>4 using the Second Palomar Sky Survey , 1995 .

[21]  Alexander G. Gray,et al.  Clustering Analysis Algorithms and Their Applications to Digital POSS-II Catalogs , 1995 .

[22]  S. Djorgovski,et al.  Initial Galaxy Counts from Digitized Poss-II , 1995 .

[23]  S. Djorgovski,et al.  Automated Star/Galaxy Classification for Digitized Poss-II , 1995 .

[24]  Peter C. Cheeseman,et al.  Bayesian Classification (AutoClass): Theory and Results , 1996, Advances in Knowledge Discovery and Data Mining.

[25]  Usama M. Fayyad,et al.  Automating the Analysis and Cataloging of Sky Surveys , 1996, Advances in Knowledge Discovery and Data Mining.