A recurrent neural network for classification of unevenly sampled variable stars

Astronomical surveys of celestial sources produce streams of noisy time series measuring flux versus time (‘light curves’). Unlike in many other physical domains, however, large (and source-specific) temporal gaps in data arise naturally due to intranight cadence choices as well as diurnal and seasonal constraints1–5. With nightly observations of millions of variable stars and transients from upcoming surveys4,6, efficient and accurate discovery and classification techniques on noisy, irregularly sampled data must be employed with minimal human-in-the-loop involvement. Machine learning for inference tasks on such data traditionally requires the laborious hand-coding of domain-specific numerical summaries of raw data (‘features’)7. Here, we present a novel unsupervised autoencoding recurrent neural network8 that makes explicit use of sampling times and known heteroskedastic noise properties. When trained on optical variable star catalogues, this network produces supervised classification models that rival other best-in-class approaches. We find that autoencoded features learned in one time-domain survey perform nearly as well when applied to another survey. These networks can continue to learn from new unlabelled observations and may be used in other unsupervised tasks, such as forecasting and anomaly detection.A novel unsupervised autoencoding recurrent neural network produces state-of-the-art supervised classification models. This network can continue to learn from new unlabelled observations and may be used in other unsupervised tasks.

[1]  A. Lapedes,et al.  Nonlinear Signal Processing Using Neural Networks , 1987 .

[2]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[3]  Kuldip K. Paliwal,et al.  Bidirectional recurrent neural networks , 1997, IEEE Trans. Signal Process..

[4]  C. Bailer-Jones,et al.  A package for the automated classification of periodic variable stars , 2015, 1512.01611.

[5]  Christopher W. Stubbs,et al.  The MACHO Project LMC Variable Star Inventory.II.LMC RR Lyrae Stars- Pulsational Characteristics and Indications of a Global Youth of the LMC , 1996 .

[6]  J. Scargle Studies in astronomical time series analysis. II - Statistical aspects of spectral analysis of unevenly spaced data , 1982 .

[7]  Joshua S. Bloom,et al.  Data Mining and Machine-Learning in Time-Domain Discovery & Classification , 2011, 1104.3142.

[8]  Gracjan Maciejewski,et al.  The All Sky Automated Survey. Catalog of Variable Stars. I. 0 h - 6 hQuarter of the Southern Hemisphere , 2002 .

[9]  Stephen T. Ridgway,et al.  THE VARIABLE SKY OF DEEP SYNOPTIC SURVEYS , 2014, 1409.3265.

[10]  Ciro Donalek,et al.  Real-time data mining of massive data streams from synoptic sky surveys , 2016, Future Gener. Comput. Syst..

[11]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[12]  Tara N. Sainath,et al.  FUNDAMENTAL TECHNOLOGIES IN MODERN SPEECH RECOGNITION Digital Object Identifier 10.1109/MSP.2012.2205597 , 2012 .

[13]  N. Lomb Least-squares frequency analysis of unequally spaced data , 1976 .

[14]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[15]  Pavlos Protopapas,et al.  SUPERVISED DETECTION OF ANOMALOUS LIGHT CURVES IN MASSIVE ASTRONOMICAL CATALOGS , 2014, ArXiv.

[16]  Yoshua Bengio,et al.  Convolutional networks for images, speech, and time series , 1998 .

[17]  A. Schwarzenberg-Czerny,et al.  Accuracy of period determination , 1991 .

[18]  Brett Naul,et al.  cesium: Open-Source Platform for Time-Series Inference , 2016, SciPy.

[19]  Nathaniel R. Butler,et al.  CONSTRUCTION OF A CALIBRATED PROBABILISTIC CLASSIFICATION CATALOG: APPLICATION TO 50k VARIABLE SOURCES IN THE ALL-SKY AUTOMATED SURVEY , 2012, 1204.4180.

[20]  J. S. Stuart,et al.  EXPLORING THE VARIABLE SKY WITH LINEAR. II. HALO STRUCTURE AND SUBSTRUCTURE TRACED BY RR LYRAE STARS TO 30 kpc , 2013, 1305.2160.

[21]  P. Dubath,et al.  Random forest automated supervised classification of Hipparcos periodic variable stars , 2011, 1101.2406.

[22]  Sao,et al.  A MACHINE-LEARNING METHOD TO INFER FUNDAMENTAL STELLAR PARAMETERS FROM PHOTOMETRIC LIGHT CURVES , 2014, 1411.1073.

[23]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[24]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[25]  Kai Lars Polsterer,et al.  Featureless Classification of Light Curves , 2015, 1504.04455.

[26]  Yoshua Bengio,et al.  Domain Adaptation for Large-Scale Sentiment Classification: A Deep Learning Approach , 2011, ICML.

[27]  Razvan Pascanu,et al.  On the difficulty of training recurrent neural networks , 2012, ICML.

[28]  Laurent Eyer,et al.  EXPLORING THE VARIABLE SKY WITH LINEAR. III. CLASSIFICATION OF PERIODIC LIGHT CURVES , 2013, 1308.0357.

[29]  G. Beylkin On the Fast Fourier Transform of Functions with Singularities , 1995 .

[30]  J. Friedman,et al.  FLEXIBLE PARSIMONIOUS SMOOTHING AND ADDITIVE MODELING , 1989 .

[31]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[32]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[33]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[34]  Geoffrey E. Hinton,et al.  Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[35]  Pavlos Protopapas,et al.  CLUSTERING-BASED FEATURE LEARNING ON VARIABLE STARS , 2016, ArXiv.

[36]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[37]  J. Curran,et al.  VAST: An ASKAP Survey for Variables and Slow Transients , 2012, Publications of the Astronomical Society of Australia.

[38]  Patrice Y. Simard,et al.  Best practices for convolutional neural networks applied to visual document analysis , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[39]  J. G. Jernigan,et al.  First Results from the All-Sky Monitor on the Rossi X-Ray Timing Explorer , 1996, astro-ph/9608109.

[40]  J. Richards,et al.  ON MACHINE-LEARNED CLASSIFICATION OF VARIABLE STARS WITH SPARSE AND NOISY TIME-SERIES DATA , 2011, 1101.1959.

[41]  Charles Elkan,et al.  Learning to Diagnose with LSTM Recurrent Neural Networks , 2015, ICLR.

[42]  Lukás Burget,et al.  Recurrent neural network based language model , 2010, INTERSPEECH.