Imbalance Learning for Variable Star Classification

The accurate automated classification of variable stars into their respective sub-types is difficult. Machine learning based solutions often fall foul of the imbalanced learning problem, which causes poor generalisation performance in practice, especially on rare variable star sub-types. In previous work, we attempted to overcome such deficiencies via the development of a hierarchical machine learning classifier. This 'algorithm-level' approach to tackling imbalance, yielded promising results on Catalina Real-Time Survey (CRTS) data, outperforming the binary and multi-class classification schemes previously applied in this area. In this work, we attempt to further improve hierarchical classification performance by applying 'data-level' approaches to directly augment the training data so that they better describe under-represented classes. We apply and report results for three data augmentation methods in particular: $\textit{R}$andomly $\textit{A}$ugmented $\textit{S}$ampled $\textit{L}$ight curves from magnitude $\textit{E}$rror ($\texttt{RASLE}$), augmenting light curves with Gaussian Process modelling ($\texttt{GpFit}$) and the Synthetic Minority Over-sampling Technique ($\texttt{SMOTE}$). When combining the 'algorithm-level' (i.e. the hierarchical scheme) together with the 'data-level' approach, we further improve variable star classification accuracy by 1-4$\%$. We found that a higher classification rate is obtained when using $\texttt{GpFit}$ in the hierarchical model. Further improvement of the metric scores requires a better standard set of correctly identified variable stars and, perhaps enhanced features are needed.

[1]  Benjamin Stappers,et al.  Comparing Multi-class, Binary and Hierarchical Machine Learning Classification schemes for variable stars , 2019, Monthly Notices of the Royal Astronomical Society.

[2]  L. M. Berliner,et al.  Maximum-Entropy and Bayesian Spectral Analysis and Estimation Problems , 1989 .

[3]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[4]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[5]  H Netzel,et al.  Blazhko Effect in the first overtone RR Lyrae stars of the OGLE Galactic bulge collection , 2018, Monthly Notices of the Royal Astronomical Society.

[6]  David D. Cox,et al.  Hyperopt: A Python Library for Optimizing the Hyperparameters of Machine Learning Algorithms , 2013, SciPy.

[7]  Pavlos Protopapas,et al.  Uncertain Classification of Variable Stars: Handling Observational GAPS and Noise , 2017, 1801.09732.

[8]  K. Sokolovsky,et al.  Machine learning search for variable stars , 2017, 1710.07290.

[9]  Ljubomir J. Buturovic,et al.  Improving k-nearest neighbor density and error estimates , 1993, Pattern Recognit..

[10]  Katsumi Inoue,et al.  Relational Reinforcement Learning for Planning with Exogenous Effects , 2017 .

[11]  Robert Kozma,et al.  2019 IEEE Symposium Series on Computational Intelligence , 2019, IEEE Computational Intelligence Magazine.

[12]  Pavlos Protopapas,et al.  Automatic Survey-invariant Classification of Variable Stars , 2017, 1801.09737.

[13]  J. Vanderplas Understanding the Lomb–Scargle Periodogram , 2017, 1703.09824.

[14]  L. Moln'ar,et al.  The Konkoly Blazhko Survey: is light-curve modulation a common property of RRab stars? , 2009, 0908.1015.

[15]  L. Kriskovics,et al.  OVERTONE AND MULTI-MODE RR LYRAE STARS IN THE GLOBULAR CLUSTER M3 , 2015, 1504.06215.

[16]  S. Burke-Spolaor,et al.  FETCH: A deep-learning based classifier for fast transient classification , 2020 .

[17]  Andrew J. Connolly,et al.  Statistics, Data Mining, and Machine Learning in Astronomy , 2014 .

[18]  David A. van Dyk,et al.  STACCATO: a novel solution to supernova photometric classification with biased training sets , 2017, 1706.03811.

[19]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[20]  Pavlos Protopapas,et al.  Streaming Classification of Variable Stars , 2019, Monthly Notices of the Royal Astronomical Society.

[21]  C. Bailer-Jones,et al.  A package for the automated classification of periodic variable stars , 2015, 1512.01611.

[22]  A. J. Drake,et al.  The Catalina Surveys Southern periodic variable star catalogue , 2017 .

[23]  Nathalie Japkowicz,et al.  The class imbalance problem: A systematic study , 2002, Intell. Data Anal..

[24]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[25]  S. Aigrain,et al.  K2SC: flexible systematics correction and detrending of K2 light curves using Gaussian process regression , 2016, 1603.09167.

[26]  M. R. Haas,et al.  Kepler Mission Design, Realized Photometric Performance, and Early Science , 2010, 1001.0268.

[27]  Andrew McCallum,et al.  Dynamic conditional random fields: factorized probabilistic models for labeling and segmenting sequence data , 2004, J. Mach. Learn. Res..

[28]  John L. Tonry,et al.  Machine-learned Identification of RR Lyrae Stars from Sparse, Multi-band Data: The PS1 Sample , 2016, 1611.08596.

[29]  Leslie Greengard,et al.  Fast Direct Methods for Gaussian Processes , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  J. Weller,et al.  Data augmentation for machine learning redshifts applied to Sloan Digital Sky Survey galaxies , 2015, 1501.06759.

[31]  O. Lahav,et al.  PHOTOMETRIC SUPERNOVA CLASSIFICATION WITH MACHINE LEARNING , 2016, 1603.00882.

[32]  C. Scheidegger,et al.  Machine-learning-based Brokers for Real-time Classification of the LSST Alert Stream , 2018, 1801.07323.

[33]  TWO-WEEK Loan COpy,et al.  University of California , 1886, The American journal of dental science.

[34]  Bradley M. Peterson,et al.  On Uncertainties in Cross‐Correlation Lags and the Reality of Wavelength‐dependent Continuum Lags in Active Galactic Nuclei , 1998, astro-ph/9802103.

[35]  Tatiana Gabruseva,et al.  Photometric light curves classification with machine learning , 2019, Journal of Astronomical Instrumentation.

[36]  Kevin Leyton-Brown,et al.  Sequential Model-Based Optimization for General Algorithm Configuration , 2011, LION.

[37]  Benny T.-H. Tsang,et al.  Deep Neural Network Classifier for Variable Stars with Novelty Detection Capability , 2019, The Astrophysical Journal.

[38]  D. Thompson,et al.  3FGL DEMOGRAPHICS OUTSIDE THE GALACTIC PLANE USING SUPERVISED MACHINE LEARNING: PULSAR AND DARK MATTER SUBHALO INTERPRETATIONS , 2016, 1605.00711.

[39]  Pavlos Protopapas,et al.  The High Cadence Transit Survey (HiTS): Compilation and Characterization of Light-curve Catalogs , 2018, The Astronomical Journal.

[40]  Marvin H. J. Guber Bayesian Spectrum Analysis and Parameter Estimation , 1988 .