Automated Real-Time Classification and Decision Making in Massive Data Streams from Synoptic Sky Surveys

The nature of scientific and technological data collection is evolving rapidly: data volumes and rates grow exponentially, with increasing complexity and information content, and there has been a transition from static data sets to data streams that must be analyzed in real time. Interesting or anomalous phenomena must be quickly characterized and followed up with additional measurements via optimal deployment of limited assets. Modern astronomy presents a variety of such phenomena in the form of transient events in digital synoptic sky surveys, including cosmic explosions (supernovae, gamma ray bursts), relativistic phenomena (black hole formation, jets), potentially hazardous asteroids, etc. We have been developing a set of machine learning tools to detect, classify and plan a response to transient events for astronomy applications, using the Catalina Real-time Transient Survey (CRTS) as a scientific and methodological testbed. The ability to respond rapidly to the potentially most interesting events is a key bottleneck that limits the scientific returns from the current and anticipated synoptic sky surveys. Similar challenge arise in other contexts, from environmental monitoring using sensor networks to autonomous spacecraft systems. Given the exponential growth of data rates, and the time-critical response, we need a fully automated and robust approach. We describe the results obtained to date, and the possible future developments.

[1]  A. J. Drake,et al.  FIRST RESULTS FROM THE CATALINA REAL-TIME TRANSIENT SURVEY , 2008, 0809.1394.

[2]  E. O. Ofek,et al.  Automating Discovery and Classification of Transients and Variable Stars in the Synoptic Survey Era , 2011, 1106.5491.

[3]  S. G. Djorgovski,et al.  Real-Time Classification of Transient Events in Synoptic Sky Surveys , 2011, Proceedings of the International Astronomical Union.

[4]  Jian Li,et al.  Iterative RELIEF for feature weighting , 2006, ICML.

[5]  C. Donalek,et al.  New Approaches to Object Classification in Synoptic Sky Surveys , 2008 .

[6]  S. Djorgovski,et al.  Sky Surveys , 2012, 1203.5111.

[7]  Ciro Donalek,et al.  Classification of Optical Transients: Experiences from PQ and CRTS Surveys , 2010 .

[8]  S. G. Djorgovski,et al.  Discovery, classification, and scientific exploration of transient events from the Catalina Real-time Transient Survey , 2011, 1111.0313.

[9]  S. G. Djorgovski,et al.  Towards an Automated Classification of Transient Events in Synoptic Sky Surveys , 2011, CIDU.

[10]  S. Djorgovski,et al.  Using conditional entropy to identify periodicity , 2013, 1306.6664.

[11]  S. Djorgovski,et al.  The Catalina Real-time Transient Survey , 2011, Proceedings of the International Astronomical Union.

[12]  Joshua S. Bloom,et al.  Data Mining and Machine-Learning in Time-Domain Discovery & Classification , 2011, 1104.3142.

[13]  Ciro Donalek,et al.  A comparison of period finding algorithms , 2013, 1307.2209.

[14]  David Heckerman,et al.  Causal independence for probability assessment and inference using Bayesian networks , 1996, IEEE Trans. Syst. Man Cybern. Part A.

[15]  David Heckerman,et al.  A Tutorial on Learning with Bayesian Networks , 1998, Learning in Graphical Models.

[16]  Ciro Donalek,et al.  Mixing Bayesian Techniques for Effective Real-time Classification of Astronomical Transients , 2010 .

[17]  Matthew Richardson,et al.  Markov logic networks , 2006, Machine Learning.

[18]  Ciro Donalek,et al.  Connecting the time domain community with the Virtual Astronomical Observatory , 2012, Other Conferences.

[19]  J. Richards,et al.  ON MACHINE-LEARNED CLASSIFICATION OF VARIABLE STARS WITH SPARSE AND NOISY TIME-SERIES DATA , 2011, 1101.1959.

[20]  Ciro Donalek,et al.  Flashes in a star stream: Automated classification of astronomical transient events , 2012, 2012 IEEE 8th International Conference on E-Science.

[21]  Hod Lipson,et al.  Distilling Free-Form Natural Laws from Experimental Data , 2009, Science.

[22]  C. Donalek,et al.  Towards Real-time Classification of Astronomical Transients , 2008, 0810.4527.

[23]  Marko Robnik-Sikonja,et al.  Theoretical and Empirical Analysis of ReliefF and RReliefF , 2003, Machine Learning.

[24]  Ciro Donalek,et al.  Machine-assisted discovery of relationships in astronomy , 2013, 1302.5129.

[25]  Yoram Singer,et al.  Using and combining predictors that specialize , 1997, STOC '97.

[26]  S. G. Djorgovski,et al.  Feature selection strategies for classifying high dimensional astronomical data sets , 2013, 2013 IEEE International Conference on Big Data.

[27]  S. G. Djorgovski,et al.  Automated probabilistic classification of transients and variables , 2008, 0802.3199.

[28]  A. A. Mahabal,et al.  The Catalina Real-Time Transient Survey (CRTS) , 2011, 1102.5004.

[29]  A. A. Mahabal,et al.  Exploring the Time Domain with Synoptic Sky Surveys , 2011, Proceedings of the International Astronomical Union.

[30]  Ciro Donalek,et al.  Data challenges of time domain astronomy , 2012, Distributed and Parallel Databases.