ENSEMBLES OF CLASSIFIERS FOR MORPHOLOGICAL GALAXY CLASSIFICATION

We compare the use of three algorithms for performing automated morphological galaxy classiÐcation using a sample of 800 galaxies. ClassiÐers are created using a single training set as well as bootstrap replicates of the training set, producing an ensemble of classiÐers. We use a Naive Bayes classiÐer, a neural network trained with backpropagation, and a decision-tree induction algorithm with pruning. Previous work in the Ðeld has emphasized backpropagation networks and decision trees. The Naive Bayes classiÐer is easy to understand and implement and often works remarkably well on real-world data. For each of these algorithms, we examine the classiÐcation accuracy of individual classiÐers using 10-fold cross validation and of ensembles of classiÐers trained using 25 bootstrap data sets and tested on the same cross-validation test sets. Our results show that (1) the neural network produced the best individual classiÐers (lowest classiÐcation error) for the majority of cases, (2) the ensemble approach signiÐcantly reduced the classiÐcation error for the neural network and the decision-tree classiÐers but not for the Naive Bayes classiÐer, (3) the ensemble approach worked better for decision trees (typical error reduction of 12%È23%) than for the neural network (typical error reduction of 7%È12%), and (4) the relative improvement when using ensembles decreases as the number of output classes increases. While more extensive comparisons are needed (e.g., a variety of data and classiÐers), our work is the Ðrst demonstration that the ensemble approach can signiÐcantly increase the performance of certain automated classiÐcation methods when applied to the domain of morphological galaxy classiÐcation. Subject headings : galaxies : fundamental parameters È methods : data analysis È methods : numerical

[1]  Yoshua Bengio,et al.  Pattern Recognition and Neural Networks , 1995 .

[2]  Karl Rihaczek,et al.  1. WHAT IS DATA MINING? , 2019, Data Mining for the Social Sciences.

[3]  Anders Krogh,et al.  Introduction to the theory of neural computation , 1994, The advanced book program.

[4]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[5]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .