GPz: non-stationary sparse Gaussian processes for heteroscedastic uncertainty estimation in photometric redshifts

The next generation of cosmology experiments will be required to use photometric redshifts rather than spectroscopic redshifts. Obtaining accurate and well-characterized photometric redshift distributions is therefore critical for Euclid, the Large Synoptic Survey Telescope and the Square Kilometre Array. However, determining accurate variance predictions alongside single point estimates is crucial, as they can be used to optimize the sample of galaxies for the specific experiment (e.g. weak lensing, baryon acoustic oscillations, supernovae), trading off between completeness and reliability in the galaxy sample. The various sources of uncertainty in measurements of the photometry and redshifts put a lower bound on the accuracy that any model can hope to achieve. The intrinsic uncertainty associated with estimates is often non-uniform and input-dependent, commonly known in statistics as heteroscedastic noise. However, existing approaches are susceptible to outliers and do not take into account variance induced by non-uniform data density and in most cases require manual tuning of many parameters. In this paper, we present a Bayesian machine learning approach that jointly optimizes the model with respect to both the predictive mean and variance we refer to as Gaussian processes for photometric redshifts (GPZ). The predictive variance of the model takes into account both the variance due to data density and photometric noise. Using the Sloan Digital Sky Survey (SDSS) DR12 data, we show that our approach substantially outperforms other machine learning methods for photo-z estimation and their associated variance, such as TPZ and ANNZ2. We provide a MATLAB and PYTHON implementations that are available to download at https://github.com/OxfordML/GPz.

[1]  Michigan.,et al.  Estimating photometric redshifts with artificial neural networks , 2002, astro-ph/0203250.

[2]  J. Mercer Functions of Positive and Negative Type, and their Connection with the Theory of Integral Equations , 1909 .

[3]  B. Garilli,et al.  Accurate photometric redshifts for the CFHT legacy survey calibrated using the VIMOS VLT deep survey , 2006, astro-ph/0603217.

[4]  D. Gerdes,et al.  PHAT: PHoto-z Accuracy Testing , 2010, 1008.0658.

[5]  Robert J. Brunner,et al.  Robust Machine Learning Applied to Astronomical Data Sets. III. Probabilistic Photometric Redshifts for Galaxies and Quasars in the SDSS and GALEX , 2008, 0804.3413.

[6]  Eibe Frank,et al.  Accurate photometric redshift probability density estimation – method comparison and application , 2015, 1503.08215.

[7]  N. Davey,et al.  Photometric redshift estimation using Gaussian processes , 2009 .

[8]  M. Way,et al.  NEW APPROACHES TO PHOTOMETRIC REDSHIFT PREDICTION VIA GAUSSIAN PROCESS REGRESSION IN THE SLOAN DIGITAL SKY SURVEY , 2009, 0905.4081.

[9]  Ofer Lahav,et al.  ANNz: Estimating Photometric Redshifts Using Artificial Neural Networks , 2004 .

[10]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[11]  Carl E. Rasmussen,et al.  Gaussian Processes for Machine Learning (GPML) Toolbox , 2010, J. Mach. Learn. Res..

[12]  Hilo,et al.  THE ELEVENTH AND TWELFTH DATA RELEASES OF THE SLOAN DIGITAL SKY SURVEY: FINAL DATA FROM SDSS-III , 2015, 1501.00963.

[13]  R. Nichol,et al.  Photometric redshift analysis in the Dark Energy Survey Science Verification data , 2014, 1406.4407.

[14]  Ashok Srivastava,et al.  Stable and Efficient Gaussian Process Calculations , 2009, J. Mach. Learn. Res..

[15]  James E. Geach,et al.  Unsupervised self-organized mapping: a versatile empirical tool for object selection, classification and redshift estimation in large surveys , 2011, 1110.0005.

[16]  Alfred O. Hero,et al.  Covariance Estimation in High Dimensions Via Kronecker Product Expansions , 2013, IEEE Transactions on Signal Processing.

[17]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[18]  Stephen J. Roberts,et al.  A Sparse Gaussian Process Framework for Photometric Redshift Estimation , 2015, ArXiv.

[19]  Adam O. Kalinich,et al.  MAPPING THE GALAXY COLOR–REDSHIFT RELATION: OPTIMAL PHOTOMETRIC REDSHIFT CALIBRATION STRATEGIES FOR COSMOLOGY SURVEYS , 2015, 1509.03318.

[20]  J. Nocedal Updating Quasi-Newton Matrices With Limited Storage , 1980 .

[21]  C. B. D'Andrea,et al.  Redshift distributions of galaxies in the Dark Energy Survey Science Verification shear catalogue and implications for weak lensing , 2015, Physical Review D.

[22]  A. Fontana,et al.  Photometric redshifts with the Multilayer Perceptron Neural Network: Application to the HDF-S and SDSS , 2003, astro-ph/0312064.

[23]  G. Zamorani,et al.  The Zurich Extragalactic Bayesian Redshift Analyzer and its first application: COSMOS , 2006 .

[24]  Nasser M. Nasrabadi,et al.  Pattern Recognition and Machine Learning , 2006, Technometrics.

[25]  Yunong Zhang,et al.  Time-series Gaussian Process Regression Based on Toeplitz Computation of O(N2) Operations and O(N)-level Storage , 2005, Proceedings of the 44th IEEE Conference on Decision and Control.

[26]  R. J. Brunner,et al.  TPZ: photometric redshift PDFs and ancillary information by using prediction trees and random forests , 2013, 1303.7269.

[27]  M. Fairbairn,et al.  GAz: a genetic algorithm for photometric redshift estimation , 2014, 1412.5997.

[28]  M. Brescia,et al.  A catalogue of photometric redshifts for the SDSS-DR9 galaxies , 2014, 1407.2527.

[29]  Paolo Coppi,et al.  EAZY: A Fast, Public Photometric Redshift Code , 2008, 0807.1533.

[30]  Heng Tao Shen,et al.  Principal Component Analysis , 2009, Encyclopedia of Biometrics.

[31]  Manda Banerji,et al.  A comparison of six photometric redshift methods applied to 1.5 million luminous red galaxies , 2008, 0812.3831.

[32]  Michael E. Tipping Sparse Bayesian Learning and the Relevance Vector Machine , 2001, J. Mach. Learn. Res..

[33]  S Roberts,et al.  Gaussian processes for time-series modelling , 2013, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[34]  Carl E. Rasmussen,et al.  A Unifying View of Sparse Approximate Gaussian Process Regression , 2005, J. Mach. Learn. Res..