Conditional Density Estimation Tools in Python and R with Applications to Photometric Redshifts and Likelihood-Free Cosmological Inference

It is well known in astronomy that propagating non-Gaussian prediction uncertainty in photometric redshift estimates is key to reducing bias in downstream cosmological analyses. Similarly, likelihood-free inference approaches, which are beginning to emerge as a tool for cosmological analysis, require a characterization of the full uncertainty landscape of the parameters of interest given observed data. However, most machine learning (ML) or training-based methods with open-source software target point prediction or classification, and hence fall short in quantifying uncertainty in complex regression and parameter inference settings. As an alternative to methods that focus on predicting the response (or parameters) $\mathbf{y}$ from features $\mathbf{x}$, we provide nonparametric conditional density estimation (CDE) tools for approximating and validating the entire probability density function (PDF) $\mathrm{p}(\mathbf{y}|\mathbf{x})$ of $\mathbf{y}$ given (i.e., conditional on) $\mathbf{x}$. As there is no one-size-fits-all CDE method, the goal of this work is to provide a comprehensive range of statistical tools and open-source software for nonparametric CDE and method assessment which can accommodate different types of settings and be easily fit to the problem at hand. Specifically, we introduce four CDE software packages in $\texttt{Python}$ and $\texttt{R}$ based on ML prediction methods adapted and optimized for CDE: $\texttt{NNKCDE}$, $\texttt{RFCDE}$, $\texttt{FlexCode}$, and $\texttt{DeepCDE}$. Furthermore, we present the $\texttt{cdetools}$ package, which includes functions for computing a CDE loss function for tuning and assessing the quality of individual PDFs, along with diagnostic functions. We provide sample code in $\texttt{Python}$ and $\texttt{R}$ as well as examples of applications to photometric redshift estimation and likelihood-free cosmological inference via CDE.

[1]  David S. Greenberg,et al.  Automatic Posterior Transformation for Likelihood-Free Inference , 2019, ICML.

[2]  Ann B. Lee,et al.  Photo-z Estimation: An Example of Nonparametric Conditional Density Estimation under Selection Bias , 2016, 1604.01339.

[3]  Rachel Mandelbaum,et al.  Weak Lensing for Precision Cosmology , 2017, Annual Review of Astronomy and Astrophysics.

[4]  Edwin Valentijn,et al.  KiDS+GAMA : cosmology constraints from a joint analysis of cosmic shear, galaxy-galaxy lensing, and angular clustering , 2017, 1706.05004.

[5]  V. A. Epanechnikov Non-Parametric Estimation of a Multivariate Probability Density , 1969 .

[6]  Jean-Michel Marin,et al.  ABC random forests for Bayesian parameter inference , 2019, Bioinform..

[7]  Emmanuel Bertin,et al.  Photometric redshifts from SDSS images using a convolutional neural network , 2018, Astronomy & Astrophysics.

[8]  Michael U. Gutmann,et al.  Adaptive Gaussian Copula ABC , 2019, AISTATS.

[9]  Martín Abadi,et al.  TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.

[10]  Honglak Lee,et al.  Learning Structured Output Representation using Deep Conditional Generative Models , 2015, NIPS.

[11]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[12]  Eibe Frank,et al.  A Simple Approach to Ordinal Classification , 2001, ECML.

[13]  R. B. Barreiro,et al.  Planck 2018 results , 2018, Astronomy & Astrophysics.

[14]  Kai Lars Polsterer,et al.  Photometric redshift estimation via deep learning , 2017, 1706.02467.

[15]  D. Balding,et al.  Approximate Bayesian computation in population genetics. , 2002, Genetics.

[16]  Ann B. Lee,et al.  RFCDE: Random Forests for Conditional Density Estimation , 2018, ArXiv.

[17]  The LSST Dark Energy Science Collaboration,et al.  Evaluation of probabilistic photometric redshift estimation approaches for LSST , 2020 .

[18]  Canada.,et al.  Data Mining and Machine Learning in Astronomy , 2009, 0906.2173.

[19]  E. Ishida,et al.  On the realistic validation of photometric redshifts , 2017, 1701.08748.

[20]  Ann B. Lee,et al.  Nonparametric Conditional Density Estimation in a High-Dimensional Regression Setting , 2016, 1604.00540.

[21]  Pedro Carvalho,et al.  Validation of Bayesian posterior distributions using a multidimensional Kolmogorov-Smirnov test , 2014, 1404.7735.

[22]  H. Hoekstra,et al.  Weak Gravitational Lensing and Its Cosmological Applications , 2008, 0805.0139.

[23]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[24]  S. J. Lilly,et al.  Precision photometric redshift calibration for galaxy–galaxy weak lensing , 2007, 0709.1692.

[25]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[26]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[27]  Ben Hoyle,et al.  Measuring photometric redshifts using galaxy images and Deep Neural Networks , 2015, Astron. Comput..

[28]  Thomas Brox,et al.  Image Orientation Estimation with Convolutional Networks , 2015, GCPR.

[29]  R. Wechsler,et al.  Approximating Photo-z PDFs for Large Surveys , 2018, The Astronomical Journal.

[30]  J. Lafferty,et al.  Sparse additive models , 2007, 0711.4555.

[31]  Heiko Götte Handbook of Approximate Bayesian Computation. Edited by Scott A.Sisson, YananFan, Mark A.Beaumont (2019). London, UK: Chapman & Hall/CRC Press. 662 pages, ISBN: 978‐1‐4398‐8150‐7. , 2019, Biometrical Journal.

[32]  Astronomy,et al.  Photometric Redshift Estimation Using Spectral Connectivity Analysis , 2009, 0906.0995.

[33]  Jakob H. Macke,et al.  Flexible statistical inference for mechanistic models of neural dynamics , 2017, NIPS.

[34]  C. B. D'Andrea,et al.  Dark Energy Survey Year 1 Results: Multi-Probe Methodology and Simulated Likelihood Analyses , 2017, 1706.09359.

[35]  Andrew J. Connolly,et al.  Statistics, Data Mining, and Machine Learning in Astronomy , 2014 .

[36]  Eduardo Serrano,et al.  LSST: From Science Drivers to Reference Design and Anticipated Data Products , 2008, The Astrophysical Journal.

[37]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[38]  Boris Leistedt,et al.  Data-driven, Interpretable Photometric Redshifts Trained on Heterogeneous and Unrepresentative Data , 2016, 1612.00847.

[39]  Ann B. Lee,et al.  EXPLOITING LOW-DIMENSIONAL STRUCTURE IN ASTRONOMICAL SPECTRA , 2008, 0807.2900.

[40]  A. J. Cenarro,et al.  High redshift galaxies in the ALHAMBRA survey , 2017, Astronomy & Astrophysics.

[41]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[42]  Karl Glazebrook,et al.  KiDS-450 + 2dFLenS: Cosmological parameter constraints from weak gravitational lensing tomography and overlapping redshift-space galaxy clustering , 2017, 1707.06627.

[43]  Farhan Feroz,et al.  SKYNET: an efficient and robust neural network training tool for machine learning in astronomy , 2013, ArXiv.

[44]  Fabian Gieseke,et al.  Uncertain Photometric Redshifts , 2016 .

[45]  Rachel Mandelbaum,et al.  PHOTOMETRIC REDSHIFT PROBABILITY DISTRIBUTIONS FOR GALAXIES IN THE SDSS DR8 , 2011, 1109.5192.

[46]  A R Walker,et al.  Cosmological Constraints from Multiple Probes in the Dark Energy Survey. , 2018, Physical review letters.

[47]  R. J. Brunner,et al.  TPZ: photometric redshift PDFs and ancillary information by using prediction trees and random forests , 2013, 1303.7269.

[48]  Ann B. Lee,et al.  ABC–CDE: Toward Approximate Bayesian Computation With Complex High-Dimensional Data and Limited Simulations , 2018, Journal of Computational and Graphical Statistics.

[49]  Hilo,et al.  THE ELEVENTH AND TWELFTH DATA RELEASES OF THE SLOAN DIGITAL SKY SURVEY: FINAL DATA FROM SDSS-III , 2015, 1501.00963.

[50]  Tom Charnock,et al.  Fast likelihood-free cosmology with neural density estimators and active learning , 2019, Monthly Notices of the Royal Astronomical Society.

[51]  Rob J Hyndman,et al.  Computing and Graphing Highest Density Regions , 1996 .

[52]  Robert Armstrong,et al.  GalSim: The modular galaxy image simulation toolkit , 2014, Astron. Comput..

[53]  Iain Murray,et al.  Fast $\epsilon$-free Inference of Simulation Models with Bayesian Conditional Density Estimation , 2016, 1605.06376.

[54]  Christian Genest,et al.  On the multivariate probability integral transformation , 2001 .

[55]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[56]  Ashok N. Srivastava,et al.  Advances in Machine Learning and Data Mining for Astronomy , 2012 .

[57]  D. Wittman,et al.  WHAT LIES BENEATH: USING p(z) TO REDUCE SYSTEMATIC PHOTOMETRIC REDSHIFT ERRORS , 2009, 0905.0892.

[58]  Alexander G. Gray,et al.  Introduction to astroML: Machine learning for astrophysics , 2012, 2012 Conference on Intelligent Data Understanding.

[59]  Dipak Munshi,et al.  Cosmology with weak lensing surveys. , 2005, Philosophical transactions. Series A, Mathematical, physical, and engineering sciences.

[60]  Rafael Izbicki,et al.  High-Dimensional Density Ratio Estimation with Extensions to Approximate Likelihood Computation , 2014, AISTATS.

[61]  G. S. Watson,et al.  Smooth regression analysis , 1964 .

[62]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[63]  Julio Michael Stern,et al.  Evidence and Credibility: Full Bayesian Significance Test for Precise Hypotheses , 1999, Entropy.

[64]  Rafael Izbicki,et al.  Converting High-Dimensional Regression to High-Dimensional Conditional Density Estimation , 2017, 1704.08095.

[65]  Aneta Siemiginowska,et al.  The Role of Machine Learning in the Next Decade of Cosmology , 2019, 1902.10159.

[66]  Benjamin Dan Wandelt,et al.  Massive optimal data compression and density estimation for scalable, likelihood-free inference in cosmology , 2018, 1801.01497.

[67]  Huan Lin,et al.  Estimating the redshift distribution of photometric galaxy samples – II. Applications and tests of a new method , 2008, 0801.3822.

[68]  Ann B. Lee,et al.  (f)RFCDE: Random Forests for Conditional Density Estimation and Functional Data , 2019, 1906.07177.

[69]  Ann B. Lee,et al.  A Spectral Series Approach to High-Dimensional Nonparametric Regression , 2016, 1602.00355.

[70]  E. Nadaraya On Estimating Regression , 1964 .

[71]  Nicolai Meinshausen,et al.  Quantile Regression Forests , 2006, J. Mach. Learn. Res..

[72]  A. B. Lee,et al.  A unified framework for constructing, tuning and assessing photometric redshift density estimates in a selection bias setting , 2017, Monthly notices of the Royal Astronomical Society.

[73]  R. J. Brunner,et al.  Sparse representation of photometric redshift probability density functions: preparing for petascale astronomy , 2014, 1404.6442.

[74]  Cardona Alzate,et al.  Predicción y selección de variables con bosques aleatorios en presencia de variables correlacionadas , 2020 .

[75]  Scott A. Sisson,et al.  Extending approximate Bayesian computation methods to high dimensions via a Gaussian copula model , 2015, 1504.04093.

[76]  Ruslan Salakhutdinov,et al.  Learning Stochastic Feedforward Neural Networks , 2013, NIPS.