Flashes in a star stream: Automated classification of astronomical transient events

An automated, rapid classification of transient events detected in the modern synoptic sky surveys is essential for their scientific utility and effective follow-up using scarce resources. This presents some unusual challenges: the data are sparse, heterogeneous and incomplete; evolving in time; and most of the relevant information comes not from the data stream itself, but from a variety of archival data and contextual information (spatial, temporal, and multi-wavelength). We are exploring a variety of novel techniques, mostly Bayesian, to respond to these challenges, using the ongoing CRTS sky survey as a testbed. The current surveys are already overwhelming our ability to effectively follow all of the potentially interesting events, and these challenges will grow by orders of magnitude over the next decade as the more ambitious sky surveys get under way. While we focus on an application in a specific domain (astrophysics), these challenges are more broadly relevant for event or anomaly detection and knowledge discovery in massive data streams.

[1]  Usama Fayyad,et al.  THE SKICAT SYSTEM FOR PROCESSING AND ANALYZING DIGITAL IMAGING SKY SURVEYS , 1995 .

[2]  David J. Spiegelhalter,et al.  Probabilistic Networks and Expert Systems - Exact Computational Methods for Bayesian Networks , 1999, Information Science and Statistics.

[3]  J. Richards,et al.  ON MACHINE-LEARNED CLASSIFICATION OF VARIABLE STARS WITH SPARSE AND NOISY TIME-SERIES DATA , 2011, 1101.1959.

[4]  E. O. Ofek,et al.  Automating Discovery and Classification of Transients and Variable Stars in the Synoptic Survey Era , 2011, 1106.5491.

[5]  H. Robbins Some aspects of the sequential design of experiments , 1952 .

[6]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[7]  Yoshua Bengio,et al.  Pattern Recognition and Neural Networks , 1995 .

[8]  Oxford,et al.  Exploring the Optical Transient Sky with the Palomar Transient Factory , 2009, 0906.5355.

[9]  Robert J. Brunner,et al.  Exploration of parameter spaces in a virtual observatory , 2001, SPIE Optics + Photonics.

[10]  Andrea Pitasi,et al.  The Fourth Paradigm , 2014 .

[11]  S. Bailey,et al.  How to Find More Supernovae with Less Work: Object Classification Techniques for Difference Imaging , 2006, 0705.0493.

[12]  Matthew Richardson,et al.  Markov logic networks , 2006, Machine Learning.

[13]  S. G. Djorgovski,et al.  Automated probabilistic classification of transients and variables , 2008, 0802.3199.

[14]  A. A. Mahabal,et al.  The Catalina Real-Time Transient Survey (CRTS) , 2011, 1102.5004.

[15]  Canada.,et al.  Data Mining and Machine Learning in Astronomy , 2009, 0906.2173.

[16]  David Heckerman,et al.  A Tutorial on Learning with Bayesian Networks , 1998, Learning in Graphical Models.

[17]  Ciro Donalek,et al.  Mixing Bayesian Techniques for Effective Real-time Classification of Astronomical Transients , 2010 .

[18]  S. G. Djorgovski,et al.  The Palomar-Quest digital synoptic sky survey , 2007, 0801.3005.

[19]  Chris H. Q. Ding,et al.  Supernova Recognition Using Support Vector Machines , 2006, 2006 5th International Conference on Machine Learning and Applications (ICMLA'06).

[20]  Bohdan Paczynski Monitoring All Sky for Variability , 2000 .

[21]  Asu,et al.  Exploration of Large Digital Sky Surveys , 2000, astro-ph/0012489.

[22]  D. Titterington,et al.  Comparison of Discrimination Techniques Applied to a Complex Data Set of Head Injured Patients , 1981 .

[23]  Ashok N. Srivastava,et al.  Advances in Machine Learning and Data Mining for Astronomy , 2012 .

[24]  T. Loredo Bayesian Adaptive Exploration , 2004, astro-ph/0409386.

[25]  Robert J. Brunner,et al.  The Digitized Second Palomar Observatory Sky Survey (DPOSS). III. Star-Galaxy Separation , 2004 .

[26]  A. A. Mahabal,et al.  The Digitized Second Palomar Observatory Sky Survey (DPOSS). II. Photometric Calibration , 2002, astro-ph/0210298.

[27]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[28]  Ciro Donalek,et al.  Classification of Optical Transients: Experiences from PQ and CRTS Surveys , 2010 .

[29]  E. Glikman,et al.  Some Pattern Recognition Challenges in Data-Intensive Astronomy , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[30]  C. D. Kemp,et al.  Density Estimation for Statistics and Data Analysis , 1987 .

[31]  C. Donalek,et al.  Towards Real-time Classification of Astronomical Transients , 2008, 0810.4527.

[32]  Michael Turmon,et al.  Statistical Pattern Recognition for Labeling Solar Active Regions: Application to SOHO/MDI Imagery , 2002 .

[33]  Joshua S. Bloom,et al.  Data Mining and Machine-Learning in Time-Domain Discovery & Classification , 2011, 1104.3142.

[34]  Pat Langley,et al.  Estimating Continuous Distributions in Bayesian Classifiers , 1995, UAI.

[35]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[36]  Pedro M. Domingos,et al.  Hybrid Markov Logic Networks , 2008, AAAI.

[37]  David Heckerman,et al.  Causal independence for probability assessment and inference using Bayesian networks , 1996, IEEE Trans. Syst. Man Cybern. Part A.

[38]  Gábor Lugosi,et al.  Prediction, learning, and games , 2006 .

[39]  D. Hand,et al.  Idiot's Bayes—Not So Stupid After All? , 2001 .

[40]  C. Donalek,et al.  New Approaches to Object Classification in Synoptic Sky Surveys , 2008 .