Random forest automated supervised classification of Hipparcos periodic variable stars

We present an evaluation of the performance of an automated classification of the Hipparcos periodic variable stars into 26 types. The sub-sample with the most reliable variability types available in the literature is used to train supervised algorithms to characterize the type dependencies on a number of attributes. The most useful attributes evaluated with the random forest methodology include, in decreasing order of importance, the period, the amplitude, the V − I colour index, the absolute magnitude, the residual around the folded light-curve model, the magnitude distribution skewness and the amplitude of the second harmonic of the Fourier series model relative to that of the fundamental frequency. Random forests and a multistage scheme involving Bayesian network and Gaussian mixture methods lead to statistically equivalent results. In standard 10-fold cross-validation (CV) experiments, the rate of correct classification is between 90 and 100 per cent, depending on the variability type. The main mis-classification cases, up to a rate of about 10 per cent, arise due to confusion between SPB and ACV blue variables and between eclipsing binaries, ellipsoidal variables and other variability types. Our training set and the predicted types for the other Hipparcos periodic stars are available online.

[1]  J. De Ridder,et al.  AUTOMATED CLASSIFICATION OF VARIABLE STARS IN THE ASTEROSEISMOLOGY PROGRAM OF THE KEPLER SPACE MISSION , 2010, 1001.0507.

[2]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[3]  Marc Ollivier,et al.  Automated supervised classification of variable stars in the CoRoT programme. Method and application , 2009 .

[4]  E. Poretti,et al.  Pulsations in the late-type Be star HD 50 209 detected by CoRoT , 2009, 0909.4524.

[5]  B. Skiff,et al.  VizieR Online Data Catalog , 2009 .

[6]  L. M. Sarro,et al.  Comparative clustering analysis of variable stars in the Hipparcos, OGLE Large Magellanic Cloud, and CoRoT exoplanet databases , 2009, 0906.0304.

[7]  M. Zechmeister,et al.  The generalised Lomb-Scargle periodogram. A new formalism for the floating-mean and Keplerian periodograms , 2009, 0901.2573.

[8]  L. Eyer,et al.  A study of supervised classification of Hipparcos variable stars using PCA and Support Vector Machines , 2007, 0712.2898.

[9]  L. M. Sarro,et al.  Automated supervised classification of variable stars - I. Methodology , 2007, 0711.0703.

[10]  V. Belokurov,et al.  Light-curve classification in massive variability surveys - II. Transients towards the Large Magellanic Cloud , 2004, astro-ph/0404232.

[11]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[12]  Remco R. Bouckaert,et al.  Choosing Between Two Learning Algorithms Based on Calibrated Tests , 2003, ICML.

[13]  Yann Le Du,et al.  Lightcurve Classification in Massive Variability Surveys , 2003 .

[14]  N. Wyn Evans,et al.  Light-curve classification in massive variability surveys — I. Microlensing , 2002, astro-ph/0211121.

[15]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[16]  L. Eyer,et al.  Automated classification of variable stars for All-Sky Automated Survey 1–2 data , 2005 .

[17]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[18]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[19]  Franz Josef Radermacher,et al.  Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference (Judea Pearl) , 1990, SIAM Rev..

[20]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[21]  M. Dworetsky A period-finding method for sparse randomly spaced observations or “How long is a piece of string?” , 1983 .

[22]  J. Scargle Studies in astronomical time series analysis. II - Statistical aspects of spectral analysis of unevenly spaced data , 1982 .

[23]  R. Stellingwerf Period determination using phase dispersion minimization , 1978 .

[24]  N. Lomb Least-squares frequency analysis of unequally spaced data , 1976 .

[25]  T. Deeming Fourier analysis with unequally-spaced data , 1975 .

[26]  I. Jurkevich,et al.  A method of computing periods of cyclic phenomena , 1971 .

[27]  W. W. Rolland,et al.  A Photoelectric Study of Magnetic Variable Stars , 1970 .

[28]  T. D. Kinman,et al.  An RR Lyrae Star Survey with Ihe Lick 20-INCH Astrograph II. The Calculation of RR Lyrae Periods by Electronic Computer. , 1965 .