A hybrid approach to machine learning annotation of large galaxy image databases

Modern astronomy relies on massive databases collected by robotic telescopes and digital sky surveys, acquiring data in a much faster pace than what manual analysis can support. Among other data, these sky surveys collect information about millions and sometimes billions of extra-galactic objects. Since the very large number of objects makes manual observation impractical, automatic methods that can analyze and annotate extra-galactic objects are required to fully utilize the discovery power of these databases. Machine learning methods for annotation of celestial objects can be separated broadly into methods that use the photometric information collected by digital sky surveys, and methods that analyze the image of the object. Here we describe a hybrid method that combines photometry and image data to annotate galaxies by their morphology, and a method that uses that information to identify objects that are visually similar to a query object (query-by-example). The results are compared to using just photometric information from SDSS, and to using just the morphological descriptors extracted directly from the images. The comparison shows that for automatic classification the image data provide marginal addition to the information provided by the photometry data. For query-by-example, however, the analysis of the image data provides more information that improves the automatic detection substantially. The source code and binaries of the method can be downloaded through the Astrophysics Source Code Library.

[1]  University of Toronto,et al.  A New Approach to Galaxy Morphology. I. Analysis of the Sloan Digital Sky Survey Early Data Release , 2003, astro-ph/0301239.

[2]  Lior Shamir,et al.  UDAT: A multi-purpose data analysis tool , 2017 .

[3]  Lior Shamir,et al.  Classification of large acoustic datasets using machine learning and crowdsourcing: application to whale calls. , 2014, The Journal of the Acoustical Society of America.

[4]  Barry F. Madore,et al.  A Catalogue of Southern Peculiar Galaxies and Associations 2 volume set , 1987 .

[5]  Wayne B. Hayes,et al.  SpArcFiRe: SCALABLE AUTOMATED DETECTION OF SPIRAL GALAXY ARM SEGMENTS , 2014, 1402.1910.

[6]  K. Revathy,et al.  Galaxy classification using fractal signature , 2003 .

[7]  K. Abazajian,et al.  THE SEVENTH DATA RELEASE OF THE SLOAN DIGITAL SKY SURVEY , 2008, 0812.0649.

[8]  Lior Shamir,et al.  Practices in source code sharing in astrophysics , 2013, Astron. Comput..

[9]  C. Lintott,et al.  Galaxy Zoo 1: data release of morphological classifications for nearly 900 000 galaxies , 2010, 1007.3265.

[10]  N. Otsu A threshold selection method from gray level histograms , 1979 .

[11]  Yannick Mellier,et al.  Project EFIGI: Automatic Classification of Galaxies , 2005 .

[12]  L. Shamir,et al.  A COMPUTER-GENERATED VISUAL MORPHOLOGY CATALOG OF ∼3,000,000 SDSS GALAXIES , 2016, 1602.06854.

[13]  Lior Shamir,et al.  WND-CHARM: Multi-purpose image classifier , 2013 .

[14]  S. Okamura,et al.  Galaxy types in the Sloan Digital Sky survey using supervised artificial neural networks , 2003, astro-ph/0306390.

[15]  C. Conselice,et al.  Mass assembly and morphological transformations since z ∼ 3 from CANDELS , 2016, 1606.04952.

[16]  E. C. Vasconcellos,et al.  DECISION TREE CLASSIFIERS FOR STAR/GALAXY SEPARATION , 2010, 1011.1951.

[17]  C. Lintott,et al.  Galaxy Zoo: reproducing galaxy morphologies via machine learning★ , 2009, 0908.2033.

[18]  Lior Shamir,et al.  Automatic detection and quantitative assessment of peculiar galaxy pairs in Sloan Digital Sky Survey , 2014, 1407.5000.

[19]  Neil Davey,et al.  An automatic taxonomy of galaxy morphology using unsupervised machine learning , 2017, 1709.05834.

[20]  Yannick Mellier,et al.  The EFIGI catalogue of 4458 nearby galaxies with detailed morphology , 2011, 1103.5734.

[21]  Lior Shamir,et al.  Quantitative analysis of spirality in elliptical galaxies , 2013, 1310.0387.

[22]  Lior Shamir,et al.  Knee X-Ray Image Analysis Method for Automated Detection of Osteoarthritis , 2009, IEEE Transactions on Biomedical Engineering.

[23]  Lior Shamir,et al.  Morphology-based Query for Galaxy Image Databases , 2016, 1611.06464.

[24]  C. Lintott,et al.  Galaxy Zoo 2: detailed morphological classifications for 304,122 galaxies from the Sloan Digital Sky Survey , 2013, 1308.3496.

[25]  S. Djorgovski,et al.  Sky Surveys , 2012, 1203.5111.

[26]  Lior Shamir,et al.  WND-CHARM: Multi-purpose image classification using compound image transforms , 2008, Pattern Recognit. Lett..

[27]  Lior Shamir,et al.  Galaxy morphology - An unsupervised machine learning approach , 2015, Astron. Comput..

[28]  Kieran Jay Edwards,et al.  Astronomy and Big Data , 2014 .

[29]  G. Vaucouleurs,et al.  Third Reference Catalogue of Bright Galaxies , 2012 .

[30]  Lior Shamir,et al.  Automatic morphological classification of galaxy images. , 2009, Monthly notices of the Royal Astronomical Society.

[31]  Lior Shamir,et al.  GANALYZER: A TOOL FOR AUTOMATIC GALAXY IMAGE ANALYSIS , 2011, 1105.3214.

[32]  B. Poggianti,et al.  The Padova–Millennium Galaxy and Group Catalogue (PM2GC): the group-finding method and the PM2GC catalogues of group, binary and single field galaxies , 2011, 1105.3683.

[33]  Lior Shamir,et al.  Practices in Code Discoverability: Astrophysics Source Code Library , 2012 .

[34]  L. Shamir,et al.  Automatic quantitative morphological analysis of interacting galaxies , 2013, Astron. Comput..

[35]  Sander Dieleman,et al.  Rotation-invariant convolutional neural networks for galaxy morphology prediction , 2015, ArXiv.

[36]  Lior Shamir,et al.  Source Code for Biology and Medicine Open Access Wndchrm – an Open Source Utility for Biological Image Analysis , 2022 .

[37]  Casiana Muñoz-Tuñón,et al.  AUTOMATIC UNSUPERVISED CLASSIFICATION OF ALL SLOAN DIGITAL SKY SURVEY DATA RELEASE 7 GALAXY SPECTRA , 2010 .

[38]  Roberto G. Abraham,et al.  A CATALOG OF DETAILED VISUAL MORPHOLOGICAL CLASSIFICATIONS FOR 14,034 GALAXIES IN THE SLOAN DIGITAL SKY SURVEY , 2010, 1001.2401.

[39]  L. Shamir,et al.  A Catalog of Automatically Detected Ring Galaxy Candidates in PanSTARSS , 2017, 1706.03873.

[40]  Lior Shamir,et al.  Combining Human and Machine Learning for Morphological Analysis of Galaxy Images , 2014, ArXiv.

[41]  Marc Huertas-Company,et al.  Revisiting the Hubble sequence in the SDSS DR7 spectroscopic sample: a publicly available Bayesian automated classification , 2010, 1010.3018.

[42]  David W. Hogg,et al.  Preparing Red‐Green‐Blue Images from CCD Data , 2003, astro-ph/0312483.

[43]  Leonidas J. Guibas,et al.  The Earth Mover's Distance as a Metric for Image Retrieval , 2000, International Journal of Computer Vision.

[44]  Robert J. Brunner,et al.  Robust Machine Learning Applied to Astronomical Data Sets. I. Star-Galaxy Classification of the Sloan Digital Sky Survey DR3 Using Decision Trees , 2006, astro-ph/0606541.

[45]  K. Borne Virtual Observatories, Data Mining, and Astroinformatics , 2013 .

[46]  L. Ho,et al.  Detailed structural decomposition of galaxy images , 2002, astro-ph/0204182.

[47]  Lior Shamir,et al.  A computer analysis method for correlating knee X-rays with continuous indicators , 2011, International Journal of Computer Assisted Radiology and Surgery.

[48]  Lior Shamir,et al.  Improving Software Citation and Credit , 2015, ArXiv.