Data-driven, Interpretable Photometric Redshifts Trained on Heterogeneous and Unrepresentative Data

We present a new method for inferring photometric redshifts in deep galaxy and quasar surveys, based on a data driven model of latent spectral energy distributions (SEDs) and a physical model of photometric fluxes as a function of redshift. This conceptually novel approach combines the advantages of both machine-learning and template-fitting methods by building template SEDs directly from the training data. This is made computationally tractable with Gaussian Processes operating in flux--redshift space, encoding the physics of redshift and the projection of galaxy SEDs onto photometric band passes. This method alleviates the need of acquiring representative training data or constructing detailed galaxy SED models; it requires only that the photometric band passes and calibrations be known or have parameterized unknowns. The training data can consist of a combination of spectroscopic and deep many-band photometric data, which do not need to entirely spatially overlap with the target survey of interest or even involve the same photometric bands. We showcase the method on the $i$-magnitude-selected, spectroscopically-confirmed galaxies in the COSMOS field. The model is trained on the deepest bands (from SUBARU and HST) and photometric redshifts are derived using the shallower SDSS optical bands only. We demonstrate that we obtain accurate redshift point estimates and probability distributions despite the training and target sets having very different redshift distributions, noise properties, and even photometric bands. Our model can also be used to predict missing photometric fluxes, or to simulate populations of galaxies with realistic fluxes and redshifts, for example. This method opens a new era in which photometric redshifts for large photometric surveys are derived using a flexible yet physical model of the data trained on all available surveys (spectroscopic and photometric).

[1]  A. Fontana,et al.  A CRITICAL ASSESSMENT OF PHOTOMETRIC REDSHIFT METHODS: A CANDELS INVESTIGATION , 2013, 1308.5353.

[2]  S. Driver,et al.  Galaxy And Mass Assembly (GAMA): Curation and reanalysis of 16.6k redshifts in the G10/COSMOS region , 2014, 1409.3574.

[3]  Ofer Lahav,et al.  ANNz: Estimating Photometric Redshifts Using Artificial Neural Networks , 2004 .

[4]  Christopher K. I. Williams,et al.  Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning) , 2005 .

[5]  G. Zamorani,et al.  The Zurich Extragalactic Bayesian Redshift Analyzer and its first application: COSMOS , 2006 .

[6]  Peter A. Flach,et al.  Advances in Neural Information Processing Systems 28 , 2015 .

[7]  Daniel Foreman-Mackey,et al.  emcee: The MCMC Hammer , 2012, 1202.3665.

[8]  R. Nichol,et al.  Photometric redshift analysis in the Dark Energy Survey Science Verification data , 2014, 1406.4407.

[9]  Adam G. Riess,et al.  Observational probes of cosmic acceleration , 2012, 1201.2434.

[10]  Davis,et al.  Overconfidence in photometric redshift estimation , 2016, 1601.07857.

[11]  D. Weedman,et al.  Colors and magnitudes predicted for high redshift galaxies , 1980 .

[12]  S. Roweis,et al.  K-Corrections and Filter Transformations in the Ultraviolet, Optical, and Near-Infrared , 2006, astro-ph/0606170.

[13]  R. J. Brunner,et al.  TPZ: photometric redshift PDFs and ancillary information by using prediction trees and random forests , 2013, 1303.7269.

[14]  D. Schlegel,et al.  Maps of Dust Infrared Emission for Use in Estimation of Reddening and Cosmic Microwave Background Radiation Foregrounds , 1998 .

[15]  Jeffrey A. Newman,et al.  RECONSTRUCTING REDSHIFT DISTRIBUTIONS WITH CROSS-CORRELATIONS: TESTS AND AN OPTIMIZED RECIPE , 2010, 1003.0687.

[16]  Paolo Coppi,et al.  EAZY: A Fast, Public Photometric Redshift Code , 2008, 0807.1533.

[17]  Edwin A. Valentijn,et al.  The Kilo-Degree Survey , 2012, Experimental Astronomy.

[18]  N. Benı́tez Bayesian Photometric Redshift Estimation , 1998, astro-ph/9811189.