Machine and Deep Learning applied to galaxy morphology - A comparative study

Morphological classification is a key piece of information to define samples of galaxies aiming to study the large-scale structure of the universe. In essence, the challenge is to build up a robust methodology to perform a reliable morphological estimate from galaxy images. Here, we investigate how to substantially improve the galaxy classification within large datasets by mimicking human classification. We combine accurate visual classifications from the Galaxy Zoo project with machine and deep learning methodologies. We propose two distinct approaches for galaxy morphology: one based on non-parametric morphology and traditional machine learning algorithms; and another based on Deep Learning. To measure the input features for the traditional machine learning methodology, we have developed a system called CyMorph, with a novel non-parametric approach to study galaxy morphology. The main dataset employed comes from the Sloan Digital Sky Survey Data Release 7 (SDSS-DR7). We also discuss the class imbalance problem considering three classes. Performance of each model is mainly measured by Overall Accuracy (OA). A spectroscopic validation with astrophysical parameters is also provided for Decision Tree models to assess the quality of our morphological classification. In all of our samples, both Deep and Traditional Machine Learning approaches have over 94.5% OA to classify galaxies among 2 classes (elliptical and spiral galaxies). We provide a catalog with ~670,000 galaxies containing our best results, including morphological metrics and classification (supplementary data link). We compare our classification with state-of-art morphological classification from literature.

[1]  Laboratoire d'Astrophysique de Marseille,et al.  The UV-Optical Galaxy Color-Magnitude Diagram. I. Basic Properties , 2007, 0706.3938.

[2]  R. Nichol,et al.  Quantifying the Bimodal Color-Magnitude Distribution of Galaxies , 2003, astro-ph/0309710.

[3]  W. M. Wood-Vasey,et al.  SDSS-III: MASSIVE SPECTROSCOPIC SURVEYS OF THE DISTANT UNIVERSE, THE MILKY WAY, AND EXTRA-SOLAR PLANETARY SYSTEMS , 2011, 1101.1529.

[4]  M. Huertas-Company,et al.  Deep Learning Identifies High-z Galaxies in a Central Blue Nugget Phase in a Characteristic Mass Range , 2018, 1804.07307.

[5]  C. Lintott,et al.  Galaxy Zoo 2: detailed morphological classifications for 304,122 galaxies from the Sloan Digital Sky Survey , 2013, 1308.3496.

[6]  Nour Eldeen M. Khalifa,et al.  Deep Galaxy: Classification of Galaxies based on Deep Convolutional Neural Networks , 2017, ArXiv.

[7]  Mario A. Storti,et al.  MPI for Python , 2005, J. Parallel Distributed Comput..

[8]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  P. H. Barchi,et al.  Improving galaxy morphology with machine learning , 2017, 1705.06818.

[10]  G. de Vaucouleurs,et al.  Revised Classification of 1500 Bright Galaxies. , 1963 .

[11]  Naoki Yasuda,et al.  Galaxy Number Counts from the Sloan Digital Sky Survey Commissioning Data , 2001, astro-ph/0105545.

[12]  J. J.,et al.  The Realm of the Nebulae , 1936, Nature.

[13]  S. Kent,et al.  CCD surface photometry of field galaxies. II: Bulge/disk decompositions , 1985 .

[14]  Ann B. Lee,et al.  Global and local two-sample tests via regression , 2018, Electronic Journal of Statistics.

[15]  M. Blanton,et al.  Physical properties and environments of nearby galaxies , 2009, 0908.3017.

[16]  Reinaldo R. Rosa,et al.  Generalized complex entropic form for gradient pattern analysis of spatio-temporal dynamics , 2000 .

[17]  A. Ribeiro,et al.  Investigating the Relation between Galaxy Properties and the Gaussianity of the Velocity Distribution of Groups and Clusters , 2017, 1707.00651.

[18]  O. I. Wong,et al.  The green valley is a red herring: Galaxy Zoo reveals two evolutionary pathways towards quenching of star formation in early-and late-type galaxies , 2014, 1402.4814.

[19]  T. S. Gonccalves,et al.  Star formation quenching in green valley galaxies at 0.5 ≲ z ≲ 1.0 and constraints with galaxy morphologies , 2017, 1709.07015.

[20]  George Bosilca,et al.  Open MPI: Goals, Concept, and Design of a Next Generation MPI Implementation , 2004, PVM/MPI.

[21]  P. Barchi,et al.  pyGHS: Computing Geometric Histogram Separation in Binomial Proportion Patterns , 2017 .

[22]  David Haussler,et al.  Occam's Razor , 1987, Inf. Process. Lett..

[23]  Aurélien Géron,et al.  Hands-On Machine Learning with Scikit-Learn and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems , 2017 .

[24]  Marianne Yasuko Takamiya,et al.  Galaxy Structural Parameters: Star Formation Rate and Evolution with Redshift , 1999 .

[25]  E. Bertin,et al.  SExtractor: Software for source extraction , 1996 .

[26]  Stefan Behnel,et al.  Cython: The Best of Both Worlds , 2011, Computing in Science & Engineering.

[27]  Gutti Jogesh Babu,et al.  Statistical Challenges of Astronomy , 2003 .

[28]  Max Pettini,et al.  The Physical Nature of Rest-UV Galaxy Morphology During the Peak Epoch of Galaxy Formation , 2007 .

[29]  C. Lintott,et al.  Galaxy Zoo 1: data release of morphological classifications for nearly 900 000 galaxies , 2010, 1007.3265.

[30]  Andreas Christmann,et al.  Support vector machines , 2008, Data Mining and Knowledge Discovery Handbook.

[31]  Fabricio Ferrari,et al.  MORFOMETRYKA—A NEW WAY OF ESTABLISHING MORPHOLOGICAL CLASSIFICATION OF GALAXIES , 2015, 1509.05430.

[32]  S. J. Press,et al.  Applied multivariate analysis : using Bayesian and frequentist methods of inference , 1984 .

[33]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[34]  San Cristóbal Mateo,et al.  The Lack of A Priori Distinctions Between Learning Algorithms , 1996 .

[35]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[36]  Ofer Lahav,et al.  Spectral Classification of Galaxies , 1995 .

[37]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[38]  Ashok N. Srivastava,et al.  Advances in Machine Learning and Data Mining for Astronomy , 2012 .

[39]  Sander Dieleman,et al.  Rotation-invariant convolutional neural networks for galaxy morphology prediction , 2015, ArXiv.

[40]  Xindong Wu,et al.  10 Challenging Problems in Data Mining Research , 2006, Int. J. Inf. Technol. Decis. Mak..

[41]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[42]  Andrew J. Connolly,et al.  Statistics, Data Mining, and Machine Learning in Astronomy , 2014 .

[43]  David Schiminovich,et al.  Quenching or Bursting: Star Formation Acceleration—A New Methodology for Tracing Galaxy Evolution , 2017, 1705.03514.

[44]  C. J.,et al.  THE ASYMMETRY OF GALAXIES: PHYSICAL MORPHOLOGY FOR NEARBY AND HIGH-REDSHIFT GALAXIES , .

[45]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[46]  David T. Bell,et al.  Noncompetitive Effects of Giant Foxtail on the Growth of Corn1 , 1972 .

[47]  Roberto G. Abraham,et al.  A CATALOG OF DETAILED VISUAL MORPHOLOGICAL CLASSIFICATIONS FOR 14,034 GALAXIES IN THE SLOAN DIGITAL SKY SURVEY , 2010, 1001.2401.

[48]  R. Nichol,et al.  The Dark Energy Survey: more than dark energy - an overview , 2016, 1601.00329.

[49]  Sibo Wang,et al.  Unsupervised learning and data clustering for the construction of Galaxy Catalogs in the Dark Energy Survey , 2018, Physics Letters B.

[50]  C. J. Conselice,et al.  New image statistics for detecting disturbed galaxy morphologies at high redshift , 2013, 1306.1238.

[51]  V. Petrosian,et al.  Surface brightness and evolution of galaxies , 1976 .

[52]  Christopher J. Conselice,et al.  The Relationship between Stellar Light Distributions of Galaxies and Their Formation Histories , 2003 .

[53]  R. Kaszynski,et al.  New Concept of Delay Equalized Low-Pass Butterworth Filters , 2006, 2006 IEEE International Symposium on Industrial Electronics.

[54]  Santiago,et al.  A CATALOG OF VISUAL-LIKE MORPHOLOGIES IN THE 5 CANDELS FIELDS USING DEEP LEARNING , 2015, 1509.05429.

[55]  et al,et al.  The Luminosity Function of Galaxies from SDSS Commissioning Data , 2000 .

[56]  Nandamudi L. Vijaykumar,et al.  Gradient pattern analysis of structural dynamics: application to molecular system relaxation , 2003 .

[57]  Marti A. Hearst Trends & Controversies: Support Vector Machines , 1998, IEEE Intell. Syst..

[58]  C. Lintott,et al.  Galaxy Zoo: morphologies derived from visual inspection of galaxies from the Sloan Digital Sky Survey , 2008, 0804.4483.

[59]  H. D. S'anchez,et al.  Improving galaxy morphologies for SDSS with Deep Learning , 2017, 1711.05744.

[60]  E. Hubble,et al.  Realm of the Nebulae , 1936 .

[61]  Yann LeCun,et al.  Generalization and network design strategies , 1989 .

[62]  Reinaldo R. Rosa,et al.  CHARACTERIZATION OF ASYMMETRIC FRAGMENTATION PATTERNS IN SPATIALLY EXTENDED SYSTEMS , 1999 .

[63]  P. Madau,et al.  A NEW NONPARAMETRIC APPROACH TO GALAXY MORPHOLOGICAL CLASSIFICATION , 2003, astro-ph/0311352.

[64]  Andrew P. Bradley,et al.  The use of the area under the ROC curve in the evaluation of machine learning algorithms , 1997, Pattern Recognit..

[65]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[66]  R. D. Carvalho,et al.  Gradient pattern analysis applied to galaxy morphology , 2018, 1803.10853.

[67]  Karl Glazebrook,et al.  The morphologies of distant galaxies. II. Classifications from the Hubble Space Telescope medium deep survey , 1996 .

[68]  B. Garilli,et al.  zCOSMOS – 10k-bright spectroscopic sample - The bimodality in the galaxy stellar mass function: exploring its evolution with redshift , 2009, 0907.5416.

[69]  M. S. Roberts,et al.  Physical Parameters Along the Hubble Sequence , 1994 .