CONSTRUCTION OF A CALIBRATED PROBABILISTIC CLASSIFICATION CATALOG: APPLICATION TO 50k VARIABLE SOURCES IN THE ALL-SKY AUTOMATED SURVEY

With growing data volumes from synoptic surveys, astronomers necessarily must become more abstracted from the discovery and introspection processes. Given the scarcity of follow-up resources, there is a particularly sharp onus on the frameworks that replace these human roles to provide accurate and well-calibrated probabilistic classification catalogs. Such catalogs inform the subsequent follow-up, allowing consumers to optimize the selection of specific sources for further study and permitting rigorous treatment of classification purities and efficiencies for population studies. Here, we describe a process to produce a probabilistic classification catalog of variability with machine learning from a multi-epoch photometric survey. In addition to producing accurate classifications, we show how to estimate calibrated class probabilities and motivate the importance of probability calibration. We also introduce a methodology for feature-based anomaly detection, which allows discovery of objects in the survey that do not fit within the predefined class taxonomy. Finally, we apply these methods to sources observed by the All-Sky Automated Survey (ASAS), and release the Machine-learned ASAS Classification Catalog (MACC), a 28 class probabilistic classification catalog of 50,124 ASAS sources in the ASAS Catalog of Variable Stars. We estimate that MACC achieves a sub-20% classification error rate and demonstrate that the class posterior probabilities are reasonably calibrated. MACC classifications compare favorably to the classifications of several previous domain-specific ASAS papers and to the ASAS Catalog of Variable Stars, which had classified only 24% of those sources into one of 12 science classes.

[1]  K. Stassun,et al.  DISCOVERY OF BRIGHT GALACTIC R CORONAE BOREALIS AND DY PERSEI VARIABLES: RARE GEMS MINED FROM ACVS , 2012, 1204.4181.

[2]  Kathryn F. Neugent,et al.  YELLOW AND RED SUPERGIANTS IN THE LARGE MAGELLANIC CLOUD , 2012, 1202.4225.

[3]  Noureddine El Karoui,et al.  Optimizing Automated Classification of Variable Stars in New Synoptic Surveys , 2012, 1201.4863.

[4]  L. Berdnikov,et al.  BVIc photometry of classical cepheids from the ASAS-3 catalog , 2011 .

[5]  Johan A. K. Suykens,et al.  Kernel spectral clustering of time series in the CoRoT exoplanet database , 2011 .

[6]  Pavlos Protopapas,et al.  QUASI-STELLAR OBJECT SELECTION ALGORITHM USING TIME VARIABILITY AND MACHINE LEARNING: SELECTION OF 1620 QUASI-STELLAR OBJECT CANDIDATES FROM MACHO LARGE MAGELLANIC CLOUD DATABASE , 2011 .

[7]  Adam A. Miller,et al.  ACTIVE LEARNING TO OVERCOME SAMPLE SELECTION BIAS: APPLICATION TO PHOTOMETRIC VARIABLE STAR CLASSIFICATION , 2011, 1106.2832.

[8]  Peter Bühlmann,et al.  MissForest - non-parametric missing value imputation for mixed-type data , 2011, Bioinform..

[9]  P. Dubath,et al.  Random forest automated supervised classification of Hipparcos periodic variable stars , 2011, 1101.2406.

[10]  J. Richards,et al.  ON MACHINE-LEARNED CLASSIFICATION OF VARIABLE STARS WITH SPARSE AND NOISY TIME-SERIES DATA , 2011, 1101.1959.

[11]  J. Caballero,et al.  Finding the most variable stars in the Orion Belt with the All Sky Automated Survey , 2010, 1001.0662.

[12]  Marc Ollivier,et al.  Automated supervised classification of variable stars in the CoRoT programme. Method and application , 2009 .

[13]  Min-Su Shin,et al.  Detecting Variability in Massive Astronomical Time-Series Data I: application of an infinite Gaussian mixture model , 2009, 0908.2664.

[14]  E. Schmidt,et al.  PHOTOMETRY OF TYPE II CEPHEID CANDIDATES FROM THE NORTHERN PART OF THE ALL SKY AUTOMATED SURVEY , 2009 .

[15]  P. Dubath,et al.  Variability type classification of multi-epoch surveys , 2009, 0901.2835.

[16]  M. Zechmeister,et al.  The generalised Lomb-Scargle periodogram. A new formalism for the floating-mean and Keplerian periodograms , 2009, 0901.2573.

[17]  L. M. Sarro,et al.  Automated supervised classification of variable stars - I. Methodology , 2007, 0711.0703.

[18]  D. Szczygiel,et al.  Multiperiodic Galactic field RR Lyrae stars in the ASAS catalogue , 2007, astro-ph/0701068.

[19]  M. Sterzik,et al.  Search for associations containing young stars (SACY). I. Sample and searching method , 2006, astro-ph/0609258.

[20]  P. Protopapas,et al.  Finding outlier light curves in catalogues of periodic variable stars , 2005, astro-ph/0505495.

[21]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[22]  Celso Grebogi,et al.  Integrated chaos-based communication , 2004 .

[23]  E. Feigelson,et al.  Statistical Challenges in Modern Astronomy , 2004, astro-ph/0401404.

[24]  P. Massey,et al.  The Evolution of Massive Stars. I. Red Supergiants in the Magellanic Clouds , 2003, astro-ph/0309272.

[25]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[26]  F. Bonnarel,et al.  The SIMBAD astronomical database. The CDS reference database for astronomical objects , 2000, astro-ph/0002110.

[27]  Peter B. Stetson,et al.  ON THE AUTOMATIC DETERMINATION OF LIGHT-CURVE PARAMETERS FOR CEPHEID VARIABLES , 1996 .

[28]  Geoffrey C. Clayton,et al.  THE R CORONAE BOREALIS STARS , 1996, 1206.3448.

[29]  William Herbst,et al.  Catalogue of UBVRI photometry of T Tauri stars and analysis of the causes of their variability , 1994 .

[30]  Michael F. Skrutskie,et al.  Circumstellar Material Associated with Solar-Type Pre-Main-Sequence Stars: A Possible Constraint on the Timescale for Planet Building , 1989 .

[31]  F. Walter X-ray sources in regions of star formation. I - The naked T Tauri stars , 1986 .

[32]  Gene H. Golub,et al.  Generalized cross-validation as a method for choosing a good ridge parameter , 1979, Milestones in Matrix Computation.

[33]  Peter Craven,et al.  Smoothing noisy data with spline functions , 1978 .

[34]  G. Herbig,et al.  Third Catalog of Emission-Line Stars of the Orion Population : 3 : 1988 , 1988 .

[35]  G. Brier VERIFICATION OF FORECASTS EXPRESSED IN TERMS OF PROBABILITY , 1950 .