Data Mining and Machine-Learning in Time-Domain Discovery & Classification

The changing heavens have played a central role in the scientific effort of astronomers for centuries. Galileo's synoptic observations of the moons of Jupiter and the phases of Venus starting in 1610, provided strong refutation of Ptolemaic cosmology. In more modern times, the discovery of a relationship between period and luminosity in some pulsational variable stars led to the inference of the size of the Milky Way, the distance scale to the nearest galaxies, and the expansion of the Universe. Distant explosions of supernovae were used to uncover the existence of dark energy and provide a precise numerical account of dark matter. Indeed, time-domain observations of transient events and variable stars, as a technique, influences a broad diversity of pursuits in the entire astronomy endeavor. While, at a fundamental level, the nature of the scientific pursuit remains unchanged, the advent of astronomy as a data-driven discipline presents fundamental challenges to the way in which the scientific process must now be conducted. Digital images (and data cubes) are not only getting larger, there are more of them. On logistical grounds, this taxes storage and transport systems. But it also implies that the intimate connection that astronomers have always enjoyed with their data---from collection to processing to analysis to inference---necessarily must evolve. The pathway to scientific inference is now influenced (if not driven by) modern automation processes, computing, data-mining and machine learning. The emerging reliance on computation and machine learning is a general one, but the time-domain aspect of the data and the objects of interest presents some unique challenges, which we describe and explore in this chapter.

[1]  Alexei V. Filippenko,et al.  The Lick Observatory Supernova Search with the Katzman Automatic Imaging Telescope , 2001 .

[2]  D. M. Bramich,et al.  A new algorithm for difference image analysis , 2008, 0802.1273.

[3]  Peter C. Cheeseman,et al.  Bayesian Classification (AutoClass): Theory and Results , 1996, Advances in Knowledge Discovery and Data Mining.

[4]  Pavlos Protopapas,et al.  Kernels for Periodic Time Series Arising in Astronomy , 2009, ECML/PKDD.

[5]  J. De Ridder,et al.  AUTOMATED CLASSIFICATION OF VARIABLE STARS IN THE ASTEROSEISMOLOGY PROGRAM OF THE KEPLER SPACE MISSION , 2010, 1001.0507.

[6]  M. Mayor,et al.  A Jupiter-mass companion to a solar-type star , 1995, Nature.

[7]  Bohdan Paczynski,et al.  Gravitational microlensing by the galactic halo , 1986 .

[8]  Laszlo Dobos,et al.  PROBING SPECTROSCOPIC VARIABILITY OF GALAXIES AND NARROW-LINE ACTIVE GALACTIC NUCLEI IN THE SLOAN DIGITAL SKY SURVEY , 2008, 0811.3714.

[9]  Thomas Matheson,et al.  Not Color‐Blind: Using Multiband Photometry to Classify Supernovae , 2002 .

[10]  Peter B. Stetson,et al.  ON THE AUTOMATIC DETERMINATION OF LIGHT-CURVE PARAMETERS FOR CEPHEID VARIABLES , 1996 .

[11]  P. A. Fridman Radio frequency interference mitigation with phase‐only adaptive beam forming , 2005 .

[12]  Laurent Eyer,et al.  Variable stars across the observational HR diagram , 2007, 0712.3797.

[13]  E. O. Ofek,et al.  Automating Discovery and Classification of Transients and Variable Stars in the Synoptic Survey Era , 2011, 1106.5491.

[14]  L. Eyer,et al.  A study of supervised classification of Hipparcos variable stars using PCA and Support Vector Machines , 2007, 0712.2898.

[15]  A. Schwarzenberg-Czerny On the advantage of using analysis of variance for period search. , 1989 .

[16]  C. Aerts,et al.  Automated supervised classification of variable stars II. Application to the OGLE database , 2008, 0806.3386.

[17]  R. Bacon,et al.  Overview of the Nearby Supernova Factory , 2002, SPIE Astronomical Telescopes + Instrumentation.

[18]  Y. Watase,et al.  Real-time difference imaging analysis of moa galactic bulge observations during 2000 , 2001 .

[19]  R. Lupton,et al.  A Method for Optimal Image Subtraction , 1997, astro-ph/9712287.

[20]  Min-Su Shin,et al.  Detecting Variability in Massive Astronomical Time-Series Data I: application of an infinite Gaussian mixture model , 2009, 0908.2664.

[21]  Alexander S. Szalay,et al.  TO APPEAR IN THE ASTROPHYSICAL JOURNAL Preprint typeset using LATEX style emulateapj v. 10/09/06 PROBABILISTIC CROSS-IDENTIFICATION OF ASTRONOMICAL SOURCES , 2008 .

[22]  M. Perryman,et al.  The Three-Dimensional Universe with Gaia , 2005 .

[23]  E. Bertin,et al.  SExtractor: Software for source extraction , 1996 .

[24]  C. Donalek,et al.  Towards Real-time Classification of Astronomical Transients , 2008, 0810.4527.

[25]  Austin B. Tomaney,et al.  Expanding the Realm of Microlensing Surveys with Difference Image Photometry , 1996 .

[26]  Alex H. Parker,et al.  Pencil-Beam Surveys for Trans-Neptunian Objects: Novel Methods for Optimization and Characterization , 2010 .

[27]  Leonid Georgiev,et al.  The Impact of the Astro2010 Recommendations on Variable Star Science , 2009, 0902.3981.

[28]  Mohan Ganeshalingam,et al.  Nearby Supernova Rates from the Lick Observatory Supernova Search. II. The Observed Luminosity Functions and Fractions of Supernovae in a Complete Sample , 2010, 1006.4612.

[29]  Robert Rosner,et al.  A Wavelet-Based Algorithm for the Spatial Analysis of Poisson Data , 2001 .

[30]  Linhua Jiang,et al.  LIGHT CURVE TEMPLATES AND GALACTIC DISTRIBUTION OF RR LYRAE STARS FROM SLOAN DIGITAL SKY SURVEY STRIPE 82 , 2009, 0910.4611.

[31]  Larry Denneau,et al.  Efficient intra- and inter-night linking of asteroid detections using kd-trees , 2007, astro-ph/0703475.

[32]  Melvin M. Varughese,et al.  Statistical classification techniques for photometric supernova typing , 2010, 1010.1005.

[33]  S. Bailey,et al.  How to Find More Supernovae with Less Work: Object Classification Techniques for Difference Imaging , 2006, 0705.0493.

[34]  Alex A. Freitas,et al.  A survey of hierarchical classification across different application domains , 2010, Data Mining and Knowledge Discovery.

[35]  Marc Ollivier,et al.  Automated supervised classification of variable stars in the CoRoT programme. Method and application , 2009 .

[36]  Henrietta S. Leavitt,et al.  1777 variables in the Magellanic Clouds , 1908 .

[37]  J. Richards,et al.  ON MACHINE-LEARNED CLASSIFICATION OF VARIABLE STARS WITH SPARSE AND NOISY TIME-SERIES DATA , 2011, 1101.1959.

[38]  L. M. Sarro,et al.  Automatic classification of eclipsing binaries light curves using neural networks , 2005, astro-ph/0511346.

[39]  G. Nelemans,et al.  Stellar variability on time-scales of minutes: results from the first 5 yr of the Rapid Temporal Survey , 2011, 1101.2445.

[40]  Nathaniel R. Butler,et al.  OPTIMAL TIME-SERIES SELECTION OF QUASARS , 2010, 1008.3143.

[41]  Ryan Chornock,et al.  Nearby supernova rates from the Lick Observatory Supernova Search – I. The methods and data base , 2010, 1006.4611.

[42]  V. Belokurov,et al.  Light and motion in SDSS Stripe 82: The catalogues , 2008, 0801.4894.

[43]  Alexei V. Filippenko,et al.  Submillijansky Transients in Archival Radio Observations , 2007, 0705.3158.

[44]  Richard G. West,et al.  The automated classification of astronomical light curves using Kohonen self-organizing maps , 2004 .

[45]  Jeffrey D. Scargle,et al.  Histogram Analysis of GALLEX, GNO, and SAGE Neutrino Data: Further Evidence for Variability of the Solar Neutrino Flux , 2000, astro-ph/0011228.

[46]  P R Wo´zniak,et al.  Difference Image Analysis of the Ogle-ii Bulge Data , 2000 .

[47]  A. Myers,et al.  THINK OUTSIDE THE COLOR BOX: PROBABILISTIC TARGET SELECTION AND THE SDSS-XDQSO QUASAR TARGETING CATALOG , 2010, 1011.6392.

[48]  Dinko Dimitrov Data-mining in astrophysics. A search for new variable stars in databases. , 2009 .

[49]  D. Frail,et al.  A planetary system around the millisecond pulsar PSR1257 + 12 , 1992, Nature.

[50]  P. Baillon,et al.  Automated Detection of Classical Novae with Neural Networks , 2005, astro-ph/0504236.

[51]  R. Tully,et al.  The Hubble constant. , 1993, Proceedings of the National Academy of Sciences of the United States of America.

[52]  Andrew Siemion,et al.  THE ALLEN TELESCOPE ARRAY TWENTY-CENTIMETER SURVEY—A 690 DEG2, 12 EPOCH RADIO DATA SET. I. CATALOG AND LONG-DURATION TRANSIENT STATISTICS , 2010, 1006.2003.

[53]  N. S. Philip,et al.  Results from the Supernova Photometric Classification Challenge , 2010, 1008.1024.

[54]  A. J. Drake,et al.  FIRST RESULTS FROM THE CATALINA REAL-TIME TRANSIENT SURVEY , 2008, 0809.1394.

[55]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[56]  E. Bullock,et al.  MODELING THE TIME VARIABILITY OF SDSS STRIPE 82 QUASARS AS A DAMPED RANDOM WALK , 2010, 1004.0276.

[57]  Jan E. Noordam,et al.  LOFAR calibration challenges , 2004, SPIE Astronomical Telescopes + Instrumentation.

[58]  Philip C. Gregory,et al.  Bayesian exoplanet tests of a new method for MCMC sampling in highly correlated model parameter spaces , 2011 .

[59]  Pavlos Protopapas,et al.  Shift-Invariant Grouped Multi-task Learning for Gaussian Processes , 2010, ECML/PKDD.

[60]  P. R. Wozniak Difference Image Analysis of the OGLE-II Bulge Data. I. The Method , 2000 .

[61]  J. Zolkower,et al.  The Palomar Transient Factory Survey Camera: first year performance and results , 2010, Astronomical Telescopes + Instrumentation.

[62]  J. Kaplan,et al.  THE SLOAN DIGITAL SKY SURVEY-II SUPERNOVA SURVEY: TECHNICAL SUMMARY , 2007, 0708.2749.

[63]  Vasily Belokurov,et al.  Self‐Organizing Maps in application to the OGLE data and Gaia Science Alerts , 2008, 0811.1808.

[64]  R. P. Butler,et al.  A Transiting “51 Peg-like” Planet , 2000, The Astrophysical journal.

[65]  Sukanta Deb,et al.  Light curve analysis of Variable stars using Fourier decomposition and Principal component analysis , 2009, 0903.3500.

[66]  R. Paul Butler,et al.  DETECTION OF EXTRASOLAR GIANT PLANETS , 1998 .

[67]  Jeffrey D. Scargle Studies in Astronomical Time Series Analysis: V. Bayesian Blocks, A New Method to Analyze Structure in , 1998 .

[68]  Michael A. Perryman,et al.  GAIA: An Astrometric and Photometric Survey of our Galaxy , 2002 .

[69]  T. Brown,et al.  Detection of Planetary Transits Across a Sun-like Star , 1999, The Astrophysical journal.

[70]  L. Eyer,et al.  Automated classification of variable stars for All‐Sky Automated Survey 1–2 data , 2001 .