Galaxy detection and identification using deep learning and data augmentation

Abstract We present a method for automatic detection and classification of galaxies which includes a novel data-augmentation procedure to make trained models more robust against the data taken from different instruments and contrast-stretching functions. This method is shown as part of AstroCV, a growing open source computer vision repository for processing and analyzing big astronomical datasets, including high performance Python and C++ algorithms used in the areas of image processing and computer vision. The underlying models were trained using convolutional neural networks and deep learning techniques, which provide better results than methods based on manual feature engineering and SVMs in most of the cases where training datasets are large. The detection and classification methods were trained end-to-end using public datasets such as the Sloan Digital Sky Survey (SDSS), the Galaxy Zoo, and private datasets such as the Next Generation Virgo (NGVS) and Fornax (NGFS) surveys. Training results are strongly bound to the conversion method from raw FITS data for each band into a 3-channel color image. Therefore, we propose data augmentation for the training using 5 conversion methods. This greatly improves the overall galaxy detection and classification for images produced from different instruments, bands and data reduction procedures. The detection and classification methods were trained using the deep learning framework DARKNET and the real-time object detection system YOLO. These methods are implemented in C language and CUDA platform, and makes intensive use of graphical processing units (GPU). Using a single high-end Nvidia GPU card, it can process a SDSS image in 50 ms and a DECam image in less than 3 s. We provide the open source code, documentation, pre-trained networks, python tutorials, and how to train your own datasets, which can be found in the AstroCV repository. https://github.com/astroCV/astroCV .

[1]  Sander Dieleman,et al.  Rotation-invariant convolutional neural networks for galaxy morphology prediction , 2015, ArXiv.

[2]  Ali Farhadi,et al.  YOLO9000: Better, Faster, Stronger , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  C. Lintott,et al.  Galaxy Zoo 1: data release of morphological classifications for nearly 900 000 galaxies , 2010, 1007.3265.

[4]  Eduardo Serrano,et al.  LSST: From Science Drivers to Reference Design and Anticipated Data Products , 2008, The Astrophysical Journal.

[5]  Karl Glazebrook,et al.  Finding strong lenses in CFHTLS using convolutional neural networks , 2017, 1704.02744.

[6]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[7]  David W. Hogg,et al.  Preparing Red‐Green‐Blue Images from CCD Data , 2003, astro-ph/0312483.

[8]  Kaiming He,et al.  Focal Loss for Dense Object Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[9]  C. Lintott,et al.  Galaxy Zoo 2: detailed morphological classifications for 304,122 galaxies from the Sloan Digital Sky Survey , 2013, 1308.3496.

[10]  A. Katsaggelos,et al.  Gravity Spy: integrating advanced LIGO detector characterization, machine learning, and citizen science , 2016, Classical and quantum gravity.

[11]  Hilo,et al.  THE ELEVENTH AND TWELFTH DATA RELEASES OF THE SLOAN DIGITAL SKY SURVEY: FINAL DATA FROM SDSS-III , 2015, 1501.00963.

[12]  Thomas H. Puzia,et al.  UNVEILING A RICH SYSTEM OF FAINT DWARF GALAXIES IN THE NEXT GENERATION FORNAX SURVEY , 2015, 1510.02475.

[13]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  C. Lintott,et al.  Galaxy Zoo: morphologies derived from visual inspection of galaxies from the Sloan Digital Sky Survey , 2008, 0804.4483.

[15]  D. Poznanski,et al.  The weirdest SDSS galaxies: results from an outlier detection algorithm , 2016, 1611.07526.