Classifying Periodic Astrophysical Phenomena from non-survey optimized variable-cadence observational data

Abstract Modern time-domain astronomy is capable of collecting a staggeringly large amount of data on millions of objects in real time. Therefore, the production of methods and systems for the automated classification of time-domain astronomical objects is of great importance. The Liverpool Telescope has a number of wide-field image gathering instruments mounted upon its structure, the Small Telescopes Installed at the Liverpool Telescope. These instruments have been in operation since March 2009 gathering data of large areas of sky around the current field of view of the main telescope generating a large dataset containing millions of light sources. The instruments are inexpensive to run as they do not require a separate telescope to operate but this style of surveying the sky introduces structured artifacts into our data due to the variable cadence at which sky fields are resampled. These artifacts can make light sources appear variable and must be addressed in any processing method. The data from large sky surveys can lead to the discovery of interesting new variable objects. Efficient software and analysis tools are required to rapidly determine which potentially variable objects are worthy of further telescope time. Machine learning offers a solution to the quick detection of variability by characterising the detected signals relative to previously seen exemplars. In this paper, we introduce a processing system designed for use with the Liverpool Telescope identifying potentially interesting objects through the application of a novel representation learning approach to data collected automatically from the wide-field instruments. Our method automatically produces a set of classification features by applying Principal Component Analysis on set of variable light curves using a piecewise polynomial fitted via a genetic algorithm applied to the epoch-folded data. The epoch-folding requires the selection of a candidate period for variable light curves identified using a genetic algorithm period estimation method specifically developed for this dataset. A Random Forest classifier is then used to classify the learned features to determine if a light curve is generated by an object of interest. This system allows for the telescope to automatically identify new targets through passive observations which do not affect day-to-day operations as the unique artifacts resulting from such a survey method are incorporated into the methods. We demonstrate the power of this feature extraction method compared to feature engineering performed by previous studies by training classification models on 859 light curves of 12 known variable star classes from our dataset. We show that our new features produce a model with a superior mean cross-validation F1 score of 0.4729 with a standard deviation of 0.0931 compared with the engineered features at 0.3902 with a standard deviation of 0.0619. We show that the features extracted from the representation learning are given relatively high importance in the final classification model. Additionally, we compare engineered features computed on the interpolated polynomial fits and show that they produce more reliable distributions than those fit to the raw light curve when the period estimation is correct.

[1]  Nathaniel R. Butler,et al.  CONSTRUCTION OF A CALIBRATED PROBABILISTIC CLASSIFICATION CATALOG: APPLICATION TO 50k VARIABLE SOURCES IN THE ALL-SKY AUTOMATED SURVEY , 2012, 1204.4180.

[2]  H. Hotelling Relations Between Two Sets of Variates , 1936 .

[3]  J. Richards,et al.  ON MACHINE-LEARNED CLASSIFICATION OF VARIABLE STARS WITH SPARSE AND NOISY TIME-SERIES DATA , 2011, 1101.1959.

[4]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[5]  N. R. Tanvir,et al.  Determination of Cepheid parameters by light-curve template fitting , 2005 .

[6]  S. Roweis,et al.  ASTROMETRY.NET: BLIND ASTROMETRIC CALIBRATION OF ARBITRARY ASTRONOMICAL IMAGES , 2009, 0910.2233.

[7]  J. Scargle Studies in astronomical time series analysis. II - Statistical aspects of spectral analysis of unevenly spaced data , 1982 .

[8]  H. Hotelling Analysis of a complex of statistical variables into principal components. , 1933 .

[9]  A. Collier Cameron,et al.  Stacked Bayesian general Lomb-Scargle periodogram : identifying stellar activity signals , 2017, 1702.03885.

[10]  Karl Pearson F.R.S. LIII. On lines and planes of closest fit to systems of points in space , 1901 .

[11]  A. J. Drake,et al.  The MACHO Project: Microlensing Results from 5.7 Years of Large Magellanic Cloud Observations , 2000, astro-ph/0001272.

[12]  E. Bertin,et al.  SExtractor: Software for source extraction , 1996 .

[13]  Robert Jedicke,et al.  Pan-STARRS: A Large Synoptic Survey Telescope Array , 2002, SPIE Astronomical Telescopes + Instrumentation.

[14]  P. Tisserand,et al.  The EROS2 search for microlensing events towards the spiral arms: the complete seven season results , 2009, 0901.1325.

[15]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Pavlos Protopapas,et al.  Computational Intelligence Challenges and Applications on Large-Scale Astronomical Time Series Databases , 2014, IEEE Computational Intelligence Magazine.

[17]  Keivan G. Stassun,et al.  EXPECTED LARGE SYNOPTIC SURVEY TELESCOPE (LSST) YIELD OF ECLIPSING BINARY STARS , 2011, 1105.6011.

[18]  Adam A. Miller,et al.  ACTIVE LEARNING TO OVERCOME SAMPLE SELECTION BIAS: APPLICATION TO PHOTOMETRIC VARIABLE STAR CLASSIFICATION , 2011, 1106.2832.

[19]  Pavlos Protopapas,et al.  A NOVEL, FULLY AUTOMATED PIPELINE FOR PERIOD ESTIMATION IN THE EROS 2 DATA SET , 2014, ArXiv.

[20]  A. Brown,et al.  Automated Variability Classification and Constant Stars in the Kepler Database , 2015 .

[21]  K. L. Polsterer,et al.  An explorative approach for inspecting Kepler data , 2015, 1508.03482.

[22]  Keivan G. Stassun,et al.  THE EB FACTORY PROJECT. II. VALIDATION WITH THE KEPLER FIELD IN PREPARATION FOR K2 AND TESS , 2014, 1409.3237.

[23]  Pavlos Protopapas,et al.  An Information Theoretic Algorithm for Finding Periodicities in Stellar Light Curves , 2012, IEEE Transactions on Signal Processing.

[24]  R. M. Deeley Variable Stars , 1916, Nature.

[25]  Dhiya Al-Jumeily,et al.  GRAPE: Genetic Routine for Astronomical Period Estimation , 2018, Monthly Notices of the Royal Astronomical Society.

[26]  A. Prsa,et al.  Artificial Intelligence Approach to the Determination of Physical Properties of Eclipsing Binaries. I. The EBAI Project , 2008, 0807.1724.

[27]  Pierre Baldi,et al.  Autoencoders, Unsupervised Learning, and Deep Architectures , 2011, ICML Unsupervised and Transfer Learning.

[28]  Gustavo A. Medrano-Cerda,et al.  The Liverpool Telescope: performance and first results , 2004, SPIE Astronomical Telescopes + Instrumentation.

[29]  Pavlos Protopapas,et al.  META-CLASSIFICATION FOR VARIABLE STARS , 2016, 1601.03013.

[30]  A. Santerne,et al.  BGLS: A Bayesian formalism for the generalised Lomb-Scargle periodogram , 2014, 1412.0467.

[31]  Peter Yoachim,et al.  Initial Estimates on the Performance of the LSST on the Detection of Eclipsing Binaries , 2017, 1703.06916.

[32]  Laurent Eyer,et al.  Variable stars across the observational HR diagram , 2007, 0712.3797.

[33]  P. Protopapas,et al.  Finding outlier light curves in catalogues of periodic variable stars , 2005, astro-ph/0505495.

[34]  R. J. Smith,et al.  STILT: System Design & Performance , 2013 .

[35]  Dhiya Al-Jumeily,et al.  A Dynamic, Modular Intelligent-Agent Framework for Astronomical Light Curve Analysis and Classification , 2016, ICIC.

[36]  Sukanta Deb,et al.  Light curve analysis of Variable stars using Fourier decomposition and Principal component analysis , 2009, 0903.3500.

[37]  C. Bailer-Jones,et al.  A package for the automated classification of periodic variable stars , 2015, 1512.01611.

[38]  Pavlos Protopapas,et al.  QSO Selection Algorithm Using Time Variability and Machine Learning: Selection of 1,620 QSO Candidates from MACHO LMC Database , 2011, 1101.3316.

[39]  N. Lomb Least-squares frequency analysis of unequally spaced data , 1976 .

[40]  Benjamin F. Williams,et al.  A PANOPLY OF CEPHEID LIGHT CURVE TEMPLATES , 2009, 0903.4186.

[41]  A U D A L S K I,et al.  Optical Gravitational Lensing Experiment. OGLE-2 – the Second Phase of the OGLE Project , 1997 .

[42]  Dhiya Al-Jumeily,et al.  The classification of periodic light curves from non-survey optimized observational data through automated extraction of phase-based visual features , 2017, 2017 International Joint Conference on Neural Networks (IJCNN).

[43]  M. Paegert,et al.  THE EB FACTORY PROJECT. I. A FAST, NEURAL-NET-BASED, GENERAL PURPOSE LIGHT CURVE CLASSIFIER OPTIMIZED FOR ECLIPSING BINARIES , 2014, 1407.0443.

[44]  Keivan G. Stassun,et al.  Kepler Eclipsing Binary Stars. VI. Identification of Eclipsing Binaries in the K2 Campaign 0 Data-set , 2015, 1503.01829.

[45]  Steven Bloemen,et al.  KEPLER ECLIPSING BINARY STARS. III. CLASSIFICATION OF KEPLER ECLIPSING BINARY LIGHT CURVES WITH LOCALLY LINEAR EMBEDDING , 2012, 1204.2113.

[46]  L. M. Sarro,et al.  Automated supervised classification of variable stars - I. Methodology , 2007, 0711.0703.

[47]  Simon Vaughan,et al.  Random time series in astronomy , 2013, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[48]  K. Ulaczyk,et al.  Liverpool Telescope follow-up of candidate electromagnetic counterparts during the first run of Advanced LIGO , 2016, 1606.04574.

[49]  P. Charbonneau Genetic algorithms in astronomy and astrophysics , 1995 .

[50]  E. al.,et al.  The Sloan Digital Sky Survey: Technical summary , 2000, astro-ph/0006396.