VAST Memo Offline and Online Classification of Simulated VAST Transients

VAST is an unprecedented wide-field survey planned with ASKAP, the Australian SKA Pathfinder, that will enable novel scientific discoveries related to known and unknown classes of radio transients and variables. The VAST data processing pipeline extracts sources from 5-second images and builds light curves that are stored in a data archive for science user consumption. This memo addresses two source classification tasks that occur within the pipeline. The first is at the archive level where science users may issue queries for known source types (offline classification). The second occurs during real-time processing in order to trigger appropriate follow up when transient phenomena are detected (online classification). Both tasks require automated methods to classify sources in the time domain. Given the unprecedented observing characteristics of VAST, it is important to estimate classification performance in both settings, and determine best practices prior to the commissioning of ASKAP’s BETA in 2012. This memo identifies candidate light curve characterizations and classification algorithms, and studies their performance under different observing strategies and levels of noise in both offline and online settings. Our results show that the choice of light curve characterization influences classification performance more than the selection of learning algorithm, and that a combination of feature sets yields best performance. We achieve approximately 93% and 70% classification accuracy in the offline and online cases respectively. Classes that are commonly confused include novae versus supernovae and ESEs versus background sources.

[1]  J. Scargle Studies in astronomical time series analysis. II - Statistical aspects of spectral analysis of unevenly spaced data , 1982 .

[2]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[3]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[4]  S. Sathiya Keerthi,et al.  Improvements to Platt's SMO Algorithm for SVM Classifier Design , 2001, Neural Computation.

[5]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[6]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[7]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[8]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[9]  Pavlos Protopapas,et al.  Kernels for Periodic Time Series Arising in Astronomy , 2009, ECML/PKDD.

[10]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[11]  Marie desJardins,et al.  Confidence-Based Feature Acquisition to Minimize Training and Test Costs , 2010, SDM.

[12]  S. G. Djorgovski,et al.  Towards an Automated Classification of Transient Events in Synoptic Sky Surveys , 2011, CIDU.

[13]  J. Richards,et al.  ON MACHINE-LEARNED CLASSIFICATION OF VARIABLE STARS WITH SPARSE AND NOISY TIME-SERIES DATA , 2011, 1101.1959.

[14]  Tara Murphy,et al.  VAST - a real-time pipeline for detecting radio transients and variables on the Australian SKA Pathfinder (ASKAP) telescope , 2012, 1201.3130.

[15]  E. O. Ofek,et al.  Automating Discovery and Classification of Transients and Variable Stars in the Synoptic Survey Era , 2011, 1106.5491.