Data Driven Discovery in Astrophysics

We review some aspects of the current state of data-intensive astronomy, its methods, and some outstanding data analysis challenges. Astronomy is at the forefront of "big data" science, with exponentially growing data volumes and data rates, and an ever-increasing complexity, now entering the Petascale regime. Telescopes and observatories from both ground and space, covering a full range of wavelengths, feed the data via processing pipelines into dedicated archives, where they can be accessed for scientific analysis. Most of the large archives are connected through the Virtual Observatory framework, that provides interoperability standards and services, and effectively constitutes a global data grid of astronomy. Making discoveries in this overabundance of data requires applications of novel, machine learning tools. We describe some of the recent examples of such applications.

[1]  S. Djorgovski,et al.  Automated Star/Galaxy Classification for Digitized Poss-II , 1995 .

[2]  Ciro Donalek,et al.  Data challenges of time domain astronomy , 2012, Distributed and Parallel Databases.

[3]  N. A. Walton,et al.  Quasar candidates selection in the Virtual Observatory era , 2008, 0805.0156.

[4]  M. Brescia,et al.  A catalogue of photometric redshifts for the SDSS-DR9 galaxies , 2014, 1407.2527.

[5]  D. Gerdes,et al.  PHAT: PHoto-z Accuracy Testing , 2010, 1008.0658.

[6]  J. Richards,et al.  ON MACHINE-LEARNED CLASSIFICATION OF VARIABLE STARS WITH SPARSE AND NOISY TIME-SERIES DATA , 2011, 1101.1959.

[7]  Mauro Garofalo,et al.  DAMEWARE: A Web Cyberinfrastructure for Astrophysical Data Mining , 2014, 1406.3538.

[8]  D. Wells,et al.  Fits: a flexible image transport system , 1981 .

[9]  Ciro Donalek,et al.  Classification of Optical Transients: Experiences from PQ and CRTS Surveys , 2010 .

[10]  Robert J. Brunner,et al.  The Digitized Second Palomar Observatory Sky Survey (DPOSS). III. Star-Galaxy Separation , 2004 .

[11]  Matthew J. Graham,et al.  The National Virtual Observatory: Tools and Techniques for Astronomical Research , 2007 .

[12]  Michigan.,et al.  Estimating photometric redshifts with artificial neural networks , 2002, astro-ph/0203250.

[13]  C. Donalek,et al.  Neural networks and photometric redshifts , 2002 .

[14]  C. Donalek,et al.  New Approaches to Object Classification in Synoptic Sky Surveys , 2008 .

[15]  Joshua S. Bloom,et al.  Data Mining and Machine-Learning in Time-Domain Discovery & Classification , 2011, 1104.3142.

[16]  Ciro Donalek,et al.  Connecting the time domain community with the Virtual Astronomical Observatory , 2012, Other Conferences.

[17]  Alyssa A. Goodman,et al.  Principles of high‐dimensional data visualization in astronomy , 2012, 1205.4747.

[18]  Ciro Donalek,et al.  Mixing Bayesian Techniques for Effective Real-time Classification of Astronomical Transients , 2010 .

[19]  D. Raffaele,et al.  Mining the SDSS archive. I. Photometric redshifts in the nearby universe , 2007, astro-ph/0703108.

[20]  S. G. Djorgovski,et al.  Towards an Automated Classification of Transient Events in Synoptic Sky Surveys , 2011, CIDU.

[21]  Massimo Brescia,et al.  Photometric redshifts with Quasi Newton Algorithm (MLPQNA). Results in the PHAT1 contest , 2012, 1206.0876.

[22]  M. Brescia,et al.  PHOTOMETRIC REDSHIFTS FOR QUASARS IN MULTI-BAND SURVEYS , 2013, 1305.5641.

[23]  C. Donalek,et al.  CLaSPS: A NEW METHODOLOGY FOR KNOWLEDGE EXTRACTION FROM COMPLEX ASTRONOMICAL DATA SETS , 2012, 1206.2919.

[24]  Jeffrey S. Norris,et al.  Immersive and collaborative data visualization using virtual reality platforms , 2014, 2014 IEEE International Conference on Big Data (Big Data).

[25]  Alexander G. Gray,et al.  EIGHT-DIMENSIONAL MID-INFRARED/OPTICAL BAYESIAN QUASAR SELECTION , 2008, 0810.3567.

[26]  A. A. Mahabal,et al.  The Digitized Second Palomar Observatory Sky Survey (DPOSS). II. Photometric Calibration , 2002, astro-ph/0210298.

[27]  Raffaele D'Abrusco,et al.  Astroinformatics of galaxies and quasars: a new general method for photometric redshifts estimation , 2011, 1107.3160.

[28]  M. Brescia,et al.  The detection of globular clusters in galaxies as a data mining problem , 2011 .

[29]  Alexander S. Szalay,et al.  Virtual Observatories of the Future , 2001 .

[30]  Massimo Brescia,et al.  Photometric classification of emission line galaxies with machine-learning methods , 2013, 1310.2840.

[31]  S. G. Djorgovski,et al.  Discovery, classification, and scientific exploration of transient events from the Catalina Real-time Transient Survey , 2011, 1111.0313.

[32]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[33]  Robert J. Hanisch The Virtual Observatory in Transition , 2006 .