Real-time data mining of massive data streams from synoptic sky surveys

The nature of scientific and technological data collection is evolving rapidly: data volumes and rates grow exponentially, with increasing complexity and information content, and there has been a transition from static data sets to data streams that must be analyzed in real time.?Interesting or anomalous phenomena must be quickly characterized and followed up with additional measurements via optimal deployment of limited assets.?Modern astronomy presents a variety of such phenomena in the form of transient events in digital synoptic sky surveys, including cosmic explosions (supernovae, gamma ray bursts), relativistic phenomena (black hole formation, jets), potentially hazardous asteroids, etc. We have been developing a set of machine learning tools to detect, classify and plan a response to transient events for astronomy applications, using the Catalina Real-time Transient Survey (CRTS) as a scientific and methodological testbed. The ability to respond rapidly to the potentially most interesting events is a key bottleneck that limits the scientific returns from the current and anticipated synoptic sky surveys.?Similar challenge arises in other contexts, from environmental monitoring using sensor networks to autonomous spacecraft systems.?Given the exponential growth of data rates, and the time-critical response, we need a fully automated and robust approach.?We describe the results obtained to date, and the possible future developments. Advances in the automated classification of transient events in synoptic sky surveys.Innovative methods for the analysis of irregularly sampled, heterogeneous time series.Novel approach to the machine-assisted discovery using a symbolic regression.Approaches to an automated decision making based on the automated classification.

[1]  Matthew Richardson,et al.  Markov logic networks , 2006, Machine Learning.

[2]  T. Mazeh,et al.  Measuring the rotation period distribution of field M dwarfs with Kepler , 2013, 1303.6787.

[3]  S. G. Djorgovski,et al.  Real-Time Classification of Transient Events in Synoptic Sky Surveys , 2011, Proceedings of the International Astronomical Union.

[4]  S. Djorgovski,et al.  A possible close supermassive black-hole binary in a quasar with optical periodicity , 2015, Nature.

[5]  Ciro Donalek,et al.  Classification of Optical Transients: Experiences from PQ and CRTS Surveys , 2010 .

[6]  S. G. Djorgovski,et al.  Discovery, classification, and scientific exploration of transient events from the Catalina Real-time Transient Survey , 2011, 1111.0313.

[7]  S. G. Djorgovski,et al.  Towards an Automated Classification of Transient Events in Synoptic Sky Surveys , 2011, CIDU.

[8]  Ciro Donalek,et al.  A comparison of period finding algorithms , 2013, 1307.2209.

[9]  Grant Foster,et al.  Wavelets for period analysis of unevenly sampled time series , 1996 .

[10]  S. Djorgovski,et al.  The Catalina Real-time Transient Survey , 2011, Proceedings of the International Astronomical Union.

[11]  S. Djorgovski,et al.  Using conditional entropy to identify periodicity , 2013, 1306.6664.

[12]  Ciro Donalek,et al.  A systematic search for close supermassive black hole binaries in the Catalina Real-time Transient Survey , 2015, 1507.07603.

[13]  Joshua S. Bloom,et al.  Data Mining and Machine-Learning in Time-Domain Discovery & Classification , 2011, 1104.3142.

[14]  N. Lomb Least-squares frequency analysis of unequally spaced data , 1976 .

[15]  Yoram Singer,et al.  Using and combining predictors that specialize , 1997, STOC '97.

[16]  S. G. Djorgovski,et al.  Feature selection strategies for classifying high dimensional astronomical data sets , 2013, 2013 IEEE International Conference on Big Data.

[17]  S. George Djorgovski,et al.  Virtual Observatory: From Concept to Implementation , 2005, ArXiv.

[18]  Hod Lipson,et al.  Distilling Free-Form Natural Laws from Experimental Data , 2009, Science.

[19]  S. G. Djorgovski,et al.  Automated probabilistic classification of transients and variables , 2008, 0802.3199.

[20]  Ciro Donalek,et al.  Data challenges of time domain astronomy , 2012, Distributed and Parallel Databases.

[21]  Luis M. de Campos,et al.  On the Use of Restrictions for Learning Bayesian Networks , 2005, ECSQARU.

[22]  A. J. Drake,et al.  FIRST RESULTS FROM THE CATALINA REAL-TIME TRANSIENT SURVEY , 2008, 0809.1394.

[23]  Jian Li,et al.  Iterative RELIEF for feature weighting , 2006, ICML.

[24]  Ciro Donalek,et al.  Flashes in a star stream: Automated classification of astronomical transient events , 2012, 2012 IEEE 8th International Conference on E-Science.

[25]  Ciro Donalek,et al.  Machine-assisted discovery of relationships in astronomy , 2013, 1302.5129.

[26]  C. Donalek,et al.  New Approaches to Object Classification in Synoptic Sky Surveys , 2008 .

[27]  S. Djorgovski,et al.  Sky Surveys , 2012, 1203.5111.

[28]  Robert J. Hanisch The Virtual Observatory in Transition , 2006 .

[29]  Ciro Donalek,et al.  Automated Real-Time Classification and Decision Making in Massive Data Streams from Synoptic Sky Surveys , 2014, 2014 IEEE 10th International Conference on e-Science.

[30]  P. Dubath,et al.  Random forest automated supervised classification of Hipparcos periodic variable stars , 2011, 1101.2406.

[31]  Ashok N. Srivastava,et al.  Advances in Machine Learning and Data Mining for Astronomy , 2012 .

[32]  J. Scargle Studies in astronomical time series analysis. II - Statistical aspects of spectral analysis of unevenly spaced data , 1982 .

[33]  C. Donalek,et al.  Towards Real-time Classification of Astronomical Transients , 2008, 0810.4527.

[34]  Marko Robnik-Sikonja,et al.  Theoretical and Empirical Analysis of ReliefF and RReliefF , 2003, Machine Learning.

[35]  Ciro Donalek,et al.  Mixing Bayesian Techniques for Effective Real-time Classification of Astronomical Transients , 2010 .

[36]  A. A. Mahabal,et al.  The Catalina Real-Time Transient Survey (CRTS) , 2011, 1102.5004.

[37]  A. A. Mahabal,et al.  Exploring the Time Domain with Synoptic Sky Surveys , 2011, Proceedings of the International Astronomical Union.

[38]  David Heckerman,et al.  A Tutorial on Learning with Bayesian Networks , 1999, Innovations in Bayesian Networks.

[39]  J. Richards,et al.  ON MACHINE-LEARNED CLASSIFICATION OF VARIABLE STARS WITH SPARSE AND NOISY TIME-SERIES DATA , 2011, 1101.1959.

[40]  T. Alexander,et al.  Improved AGN light curve analysis with the z-transformed discrete correlation function , 2013, 1302.1508.

[41]  E. O. Ofek,et al.  Automating Discovery and Classification of Transients and Variable Stars in the Synoptic Survey Era , 2011, 1106.5491.

[42]  Ciro Donalek,et al.  Connecting the time domain community with the Virtual Astronomical Observatory , 2012, Other Conferences.