Ensembles of Classifiers for Morphological Galaxy Classification

We compare the use of three algorithms for performing automated morphological galaxy classification using a sample of 800 galaxies. Classifiers are created using a single training set as well as bootstrap replicates of the training set, producing an ensemble of classifiers. We use a Naive Bayes classifier, a neural network trained with backpropagation, and a decision-tree induction algorithm with pruning. Previous work in the field has emphasized backpropagation networks and decision trees. The Naive Bayes classifier is easy to understand and implement and often works remarkably well on real-world data. For each of these algorithms, we examine the classification accuracy of individual classifiers using 10-fold cross validation and of ensembles of classifiers trained using 25 bootstrap data sets and tested on the same cross-validation test sets. Our results show that (1) the neural network produced the best individual classifiers (lowest classification error) for the majority of cases, (2) the ensemble approach significantly reduced the classification error for the neural network and the decision-tree classifiers but not for the Naive Bayes classifier, (3) the ensemble approach worked better for decision trees (typical error reduction of 12%-23%) than for the neural network (typical error reduction of 7%-12%), and (4) the relative improvement when using ensembles decreases as the number of output classes increases. While more extensive comparisons are needed (e.g., a variety of data and classifiers), our work is the first demonstration that the ensemble approach can significantly increase the performance of certain automated classification methods when applied to the domain of morphological galaxy classification.

[1]  D. Opitz,et al.  Popular Ensemble Methods: An Empirical Study , 1999, J. Artif. Intell. Res..

[2]  D. Bazell Feature relevance in morphological galaxy classification , 2000 .

[3]  S. Odewahn,et al.  Automated star/galaxy discrimination with neural networks , 1992 .

[4]  S. Maddox,et al.  The APM galaxy survey. I - APM measurements and star-galaxy separation , 1990 .

[5]  O. Lahav,et al.  Morphological Classification of galaxies by Artificial Neural Networks , 1992 .

[6]  Richard L. White,et al.  DECISION TREES FOR AUTOMATED IDENTIFICATION OF COSMIC-RAY HITS IN HUBBLE SPACE TELESCOPE IMAGES , 1995 .

[7]  M. C. Storrie-Lombardi,et al.  Automated classification of stellar spectra - I. Initial results with artificial neural networks , 1994 .

[8]  Usama Fayyad,et al.  THE SKICAT SYSTEM FOR PROCESSING AND ANALYZING DIGITAL IMAGING SKY SURVEYS , 1995 .

[9]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[10]  James L. McClelland,et al.  Parallel distributed processing: explorations in the microstructure of cognition, vol. 1: foundations , 1986 .

[11]  J. Ross Quinlan,et al.  Improved Use of Continuous Attributes in C4.5 , 1996, J. Artif. Intell. Res..

[12]  Thomas G. Dietterich,et al.  Solving Multiclass Learning Problems via Error-Correcting Output Codes , 1994, J. Artif. Intell. Res..

[13]  R. E. Griffiths,et al.  USING OBLIQUE DECISION TREES FOR THE MORPHOLOGICAL CLASSIFICATION OF GALAXIES , 1996 .

[14]  A. Naim,et al.  Automated morphological classification of APM galaxies by supervised artificial neural networks , 1995, astro-ph/9503001.