Scalable End-to-end Recurrent Neural Network for Variable star classification

During the last decade, considerable effort has been made to perform automatic classification of variable stars using machine-learning techniques. Traditionally, light curves are represented as a vector of descriptors or features used as input for many algorithms. Some features are computationally expensive, cannot be updated quickly and hence for large data sets such as the LSST cannot be applied. Previous work has been done to develop alternative unsupervised feature extraction algorithms for light curves, but the cost of doing so still remains high. In this work, we propose an end-to-end algorithm that automatically learns the representation of light curves that allows an accurate automatic classification. We study a series of deep learning architectures based on recurrent neural networks and test them in automated classification scenarios. Our method uses minimal data pre-processing, can be updated with a low computational cost for new observations and light curves, and can scale up to massive data sets. We transform each light curve into an input matrix representation whose elements are the differences in time and magnitude, and the outputs are classification probabilities. We test our method in three surveys: OGLE-III, Gaia, and WISE. We obtain accuracies of about $95{{\ \rm per\ cent}}$ in the main classes and $75{{\ \rm per\ cent}}$ in the majority of subclasses. We compare our results with the Random Forest classifier and obtain competitive accuracies while being faster and scalable. The analysis shows that the computational complexity of our approach grows up linearly with the light-curve size, while the traditional approach cost grows as Nlog (N).

[1]  Cardona Alzate,et al.  Predicción y selección de variables con bosques aleatorios en presencia de variables correlacionadas , 2020 .

[2]  Carlos Aguirre,et al.  Deep multi-survey classification of variable stars , 2018, Monthly Notices of the Royal Astronomical Society.

[3]  T. A. Lister,et al.  Gaia Data Release 2. Summary of the contents and survey properties , 2018, 1804.09365.

[4]  P. J. Richards,et al.  Gaia Data Release 2: Variable stars in the colour-absolute magnitude diagram , 2018, 1804.09382.

[5]  P. J. Richards,et al.  Gaia Data Release 2 , 2018, Astronomy & Astrophysics.

[6]  L. Valenzuela,et al.  Unsupervised classification of variable stars , 2018, 1801.09723.

[7]  J. Vanderplas Understanding the Lomb–Scargle Periodogram , 2017, 1703.09824.

[8]  Christopher J. Shallue,et al.  Identifying Exoplanets with Deep Learning: A Five-planet Resonant Chain around Kepler-80 and an Eighth Planet around Kepler-90 , 2017, 1712.05044.

[9]  Brett Naul,et al.  A recurrent neural network for classification of unevenly sampled variable stars , 2017, Nature Astronomy.

[10]  Brad E. Tucker,et al.  Convolutional neural networks for transient candidate vetting in large-scale surveys , 2017, 1708.08947.

[11]  Pavlos Protopapas,et al.  Automatic Survey-invariant Classification of Variable Stars , 2017, 1801.09737.

[12]  Miguel de Val-Borro,et al.  Science-Driven Optimization of the LSST Observing Strategy , 2017, 1708.04058.

[13]  Pablo A. Estévez,et al.  Deep-HiTS: Rotation Invariant Convolutional Neural Network for Transient Detection , 2017, ArXiv.

[14]  Dante Minniti,et al.  A machine learned classifier for RR Lyrae in the VVV survey , 2016, 1610.05707.

[15]  Alessio Botta,et al.  Astrophysics and Big Data: Challenges, Methods, and Tools , 2016, Astroinformatics.

[16]  Andrew J. Connolly,et al.  Everything we'd like to do with LSST data, but we don't know (yet) how , 2016, Astroinformatics.

[17]  Tom Charnock,et al.  Deep Recurrent Neural Networks for Supernovae Classification , 2016, ArXiv.

[18]  M. Catelán,et al.  Mapping the outer bulge with RRab stars from the VVV Survey , 2016, 1604.01336.

[19]  Martín Abadi,et al.  TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.

[20]  Pavlos Protopapas,et al.  CLUSTERING-BASED FEATURE LEARNING ON VARIABLE STARS , 2016, ArXiv.

[21]  Pavlos Protopapas,et al.  META-CLASSIFICATION FOR VARIABLE STARS , 2016, 1601.03013.

[22]  C. Bailer-Jones,et al.  A package for the automated classification of periodic variable stars , 2015, 1512.01611.

[23]  Pavlos Protopapas,et al.  FATS: Feature Analysis for Time Series , 2015, 1506.00010.

[24]  Zachary Chase Lipton A Critical Review of Recurrent Neural Networks for Sequence Learning , 2015, ArXiv.

[25]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[26]  Sander Dieleman,et al.  Rotation-invariant convolutional neural networks for galaxy morphology prediction , 2015, ArXiv.

[27]  M. Catelán,et al.  Pulsating Stars: Smith/Pulsating Stars , 2015 .

[28]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[29]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[30]  Yoshua Bengio,et al.  Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.

[31]  Pavlos Protopapas,et al.  Computational Intelligence Challenges and Applications on Large-Scale Astronomical Time Series Databases , 2014, IEEE Computational Intelligence Magazine.

[32]  Pavlos Protopapas,et al.  The EPOCH Project - I. Periodic variable stars in the EROS-2 LMC database , 2014, 1403.6131.

[33]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[34]  Pavlos Protopapas,et al.  AUTOMATIC CLASSIFICATION OF VARIABLE STARS IN CATALOGS WITH MISSING DATA , 2013, ArXiv.

[35]  Geoffrey E. Hinton,et al.  Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[36]  Paul M. Brunet,et al.  The Gaia mission , 2013, 1303.0303.

[37]  P. Protopapas,et al.  An improved quasar detection method in EROS-2 and MACHO LMC data sets , 2012, 1304.0401.

[38]  Noureddine El Karoui,et al.  Optimizing Automated Classification of Variable Stars in New Synoptic Surveys , 2012, 1201.4863.

[39]  Pavlos Protopapas,et al.  QUASI-STELLAR OBJECT SELECTION ALGORITHM USING TIME VARIABILITY AND MACHINE LEARNING: SELECTION OF 1620 QUASI-STELLAR OBJECT CANDIDATES FROM MACHO LARGE MAGELLANIC CLOUD DATABASE , 2011 .

[40]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[41]  Pavlos Protopapas,et al.  QSO Selection Algorithm Using Time Variability and Machine Learning: Selection of 1,620 QSO Candidates from MACHO LMC Database , 2011, 1101.3316.

[42]  P. Dubath,et al.  Random forest automated supervised classification of Hipparcos periodic variable stars , 2011, 1101.2406.

[43]  J. Richards,et al.  ON MACHINE-LEARNED CLASSIFICATION OF VARIABLE STARS WITH SPARSE AND NOISY TIME-SERIES DATA , 2011, 1101.1959.

[44]  Martin G. Cohen,et al.  THE WIDE-FIELD INFRARED SURVEY EXPLORER (WISE): MISSION DESCRIPTION AND INITIAL ON-ORBIT PERFORMANCE , 2010, 1008.0031.

[45]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[46]  Donald W. Sweeney,et al.  LSST Science Book, Version 2.0 , 2009, 0912.0201.

[47]  L. M. Sarro,et al.  Automated supervised classification of variable stars - I. Methodology , 2007, 0711.0703.

[48]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[49]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[50]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[51]  Kuldip K. Paliwal,et al.  Bidirectional recurrent neural networks , 1997, IEEE Trans. Signal Process..

[52]  P. Cottrell,et al.  RV Tauri stars -I. A long-term photometric survey , 1996 .

[53]  Christopher W. Stubbs,et al.  The macho project first-year large magellanic cloud results: The microlensing rate and the nature of the galactic dark halo , 1996 .

[54]  H.S.Park,et al.  The MACHO Project First Year LMC Results: The Microlensing Rate and the Nature of the Galactic Dark Halo , 1995, astro-ph/9506113.

[55]  Michael A. Arbib,et al.  The handbook of brain theory and neural networks , 1995, A Bradford book.

[56]  Yoshua Bengio,et al.  Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.

[57]  Paul J. Werbos,et al.  Backpropagation Through Time: What It Does and How to Do It , 1990, Proc. IEEE.

[58]  J. Scargle Studies in astronomical time series analysis. II - Statistical aspects of spectral analysis of unevenly spaced data , 1982 .

[59]  N. Lomb Least-squares frequency analysis of unequally spaced data , 1976 .

[60]  M. Catelán,et al.  Pulsating Stars , 1942, Science.