The overlooked potential of Generalized Linear Models in astronomy-II: Gamma regression and photometric redshifts

Machine learning techniques offer a precious tool box for use within astronomy to solve problems involving so-called big data. They provide a means to make accurate predictions about a particular system without prior knowledge of the underlying physical processes of the data. In this article, and the companion papers of this series, we present the set of Generalized Linear Models (GLMs) as a fast alternative method for tackling general astronomical problems, including the ones related to the machine learning paradigm. To demonstrate the applicability of GLMs to inherently positive and continuous physical observables, we explore their use in estimating the photometric redshifts of galaxies from their multi-wavelength photometry. Using the gamma family with a log link function we predict redshifts from the PHoto-z Accuracy Testing simulated catalogue and a subset of the Sloan Digital Sky Survey from Data Release 10. We obtain fits that result in catastrophic outlier rates as low as ~1% for simulated and ~2% for real data. Moreover, we can easily obtain such levels of precision within a matter of seconds on a normal desktop computer and with training sets that contain merely thousands of galaxies. Our software is made publicly available as an user-friendly package developed in Python, R and via an interactive web application (this https URL). This software allows users to apply a set of GLMs to their own photometric catalogues and generates publication quality plots with minimum effort from the user. By facilitating their ease of use to the astronomical community, this paper series aims to make GLMs widely known and to encourage their implementation in future large-scale projects, such as the Large Synoptic Survey Telescope.

[1]  Emille E. O. Ishida,et al.  Hubble parameter reconstruction from a principal component analysis: minimizing the bias , 2010, 1012.5335.

[2]  Serhat Guven,et al.  GLM Basic Modeling : Avoiding Common pi ~ Calls , 2007 .

[3]  D. Brown,et al.  Models in biology : mathematics, statistics and computing , 1995 .

[4]  Y. Wadadekar Estimating Photometric Redshifts Using Support Vector Machines , 2004, astro-ph/0412005.

[5]  C. Tao,et al.  A metric space for Type Ia supernova spectra , 2014, 1612.07104.

[6]  A. Szalay,et al.  Slicing Through Multicolor Space: Galaxy Redshifts from Broadband Photometry , 1995, astro-ph/9508100.

[7]  R. Nichol,et al.  Photometric redshift analysis in the Dark Energy Survey Science Verification data , 2014, 1406.4407.

[8]  Donald W. Sweeney,et al.  LSST Science Book, Version 2.0 , 2009, 0912.0201.

[9]  Ofer Lahav,et al.  ANNz: Estimating Photometric Redshifts Using Artificial Neural Networks , 2004 .

[10]  Yi Ma,et al.  Robust principal component analysis? , 2009, JACM.

[11]  I. Jolliffe Principal Component Analysis , 2002 .

[12]  A. Krone-Martins,et al.  UPMASK: unsupervised photometric membership assignment in stellar clusters , 2013, 1309.4471.

[13]  Eric R. Ziegel,et al.  An Introduction to Generalized Linear Models , 2002, Technometrics.

[14]  P. McCullagh,et al.  Generalized Linear Models , 1984 .

[15]  D. A. García-Hernández,et al.  THE TENTH DATA RELEASE OF THE SLOAN DIGITAL SKY SURVEY: FIRST SPECTROSCOPIC DATA FROM THE SDSS-III APACHE POINT OBSERVATORY GALACTIC EVOLUTION EXPERIMENT , 2013, 1307.7735.

[16]  D. Gerdes,et al.  PHAT: PHoto-z Accuracy Testing , 2010, 1008.0658.

[17]  R. S. de Souza,et al.  The overlooked potential of Generalized Linear Models in astronomy, I: Binomial regression , 2014, Astron. Comput..

[18]  J. Hardin,et al.  Generalized Linear Models and Extensions, Third Edition , 2012 .

[19]  R. Souza,et al.  Robust PCA and MIC statistics of baryons in early minihaloes , 2013, 1308.6009.

[20]  Christopher S. Oehmen,et al.  SVM-HUSTLE - an iterative semi-supervised machine learning approach for pairwise protein remote homology detection , 2008, Bioinform..

[21]  N. Benı́tez Bayesian Photometric Redshift Estimation , 1998, astro-ph/9811189.

[22]  Joseph M. Hilbe,et al.  Modeling Count Data , 2014, International Encyclopedia of Statistical Science.

[23]  J. Lindsey,et al.  A review of some extensions to generalized linear models. , 1999, Statistics in medicine.

[24]  C. Baltay,et al.  Wide-Field InfraRed Survey Telescope WFIRST Final Report , 2012 .

[25]  B. Garilli,et al.  Accurate photometric redshifts for the CFHT legacy survey calibrated using the VIMOS VLT deep survey , 2006, astro-ph/0603217.

[26]  John A. Nelder,et al.  The analysis of randomized experiments with orthogonal block structure. II. Treatment structure and the general analysis of variance , 1965, Proceedings of the Royal Society of London. Series A. Mathematical and Physical Sciences.

[27]  Yanxia Zhang,et al.  Review of techniques for photometric redshift estimation , 2012, Other Conferences.

[28]  E. E. O. Ishida,et al.  Probing cosmic star formation up to z= 9.4 with gamma-ray bursts: Probing SFH with GRBs , 2011 .

[29]  Gillian Z. Heller,et al.  Generalized Linear Models for Insurance Data , 2008 .

[30]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[31]  Michigan.,et al.  Estimating photometric redshifts with artificial neural networks , 2002, astro-ph/0203250.

[32]  Christopher J. Conselice The fundamental properties of galaxies and a new galaxy classification system , 2006 .

[33]  Alex Alves Freitas,et al.  Estimating Photometric Redshifts Using Genetic Algorithms , 2006, SGAI Conf..

[34]  Ofer Lahav,et al.  Estimating photometric redshifts with ANNs 1 20 ? ? , .

[35]  E. al.,et al.  The Sloan Digital Sky Survey: Technical summary , 2000, astro-ph/0006396.

[36]  J. Gunn,et al.  The Sloan Digital Sky Survey , 1994, astro-ph/9412080.

[37]  D. Rubinfeld,et al.  Econometric models and economic forecasts , 2002 .

[38]  E. Ishida,et al.  Kernel PCA for Type Ia supernovae photometric classification , 2012, 1201.6676.

[39]  E. Ishida,et al.  The first analytical expression to estimate photometric redshifts suggested by a machine , 2013, 1308.4145.

[40]  France,et al.  Photometric Redshifts based on standard SED fitting procedures , 2000 .

[41]  R. J. Brunner,et al.  Exhausting the Information: Novel Bayesian Combination of Photometric Redshift PDFs , 2014, 1403.0044.

[42]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[43]  Joseph Hilbe,et al.  CosmoPhotoz: Photometric redshift estimation using generalized linear models , 2014 .

[44]  R. Rigby,et al.  Generalized Additive Models for Location Scale and Shape (GAMLSS) in R , 2007 .

[45]  Claudio Dalla Vecchia,et al.  The correlation structure of dark matter halo properties , 2011, 1103.5467.

[46]  Mark Hebblewhite,et al.  The importance of observation versus process error in analyses of global ungulate populations , 2013, Scientific Reports.

[47]  Manda Banerji,et al.  A comparison of six photometric redshift methods applied to 1.5 million luminous red galaxies , 2008, 0812.3831.

[48]  A. Amara,et al.  Euclid Imaging Consortium Science Book , 2010 .

[49]  E. E. O. Ishida,et al.  Probing cosmic star formation up to z = 9.4 with GRBs , 2011, 1106.1745.

[50]  J. Hardin,et al.  Generalized Linear Models and Extensions , 2001 .