Data mining techniques are increasingly gaining popularity in various scientific domains as viable approaches to the analysis of massive data sets. In this chapter, we describe our experiences in applying data mining to a problem in astronomy, namely, the identification of radio-emitting galaxies with a bent-double morphology. Until recently, astronomers associated with the FIRST (Faint images of the radio Sky at Twenty-cm) survey identified these galaxies through a visual inspection of images. White this manual approach has been very subjective and tedious, it is also becoming increasingly infeasible as the survey has grown in size. Upon completion, FIRST will include almost a million galaxies, making the use of semi-automated analysis methods necessary. We describe the FIRST data set and the problem of identifying bent-double galaxies. We discuss our solution approach, focusing on the challenges we face in the application of data mining to a scientific data set. We explain why, in contrast with most commercial data mining applications, data preprocessing requires a considerable effort in scientific applications. Using decision tree classifiers, we describe the work we are doing in the detection of bent-double galaxies. Our results indicate that data mining techniques, steered by proper domain knowledge, can greatly enhance the manual exploration of massive data sets.
[1]
C. Kamath,et al.
Finding Bent-double Radio Galaxies: A Case Study in Data Mining
,
2000
.
[2]
Philip K. Chan,et al.
Advances in Distributed and Parallel Knowledge Discovery
,
2000
.
[3]
Richard L. White,et al.
A Catalog of 1.4 GHz Radio Sources from the FIRST Survey
,
1997
.
[4]
E. Cantu-Paz,et al.
On the Design of a Parallel Object-Oriented Data Mining Toolkit
,
2000,
KDD 2000.
[5]
J. Ross Quinlan,et al.
C4.5: Programs for Machine Learning
,
1992
.
[6]
C. Kochanek.
Gravitational Lensing: Recent Progress and Future Goals
,
2000
.
[7]
Richard L. White,et al.
The FIRST Survey: Faint Images of the Radio Sky at twenty centimeters
,
1995
.
[8]
Leo Breiman,et al.
Classification and Regression Trees
,
1984
.
[9]
Chandrika Kamath,et al.
Design and implementation of a parallel object-oriented image processing toolkit
,
2000,
SPIE Optics + Photonics.