A machine learning approach to galaxy properties: Joint redshift - stellar mass probability distributions with Random Forest

We demonstrate that highly accurate joint redshift–stellar mass probability distribution functions (PDFs) can be obtained using the Random Forest (RF) machine learning (ML) algorithm, even with few photometric bands available. As an example, we use the Dark Energy Survey (DES), combined with the COSMOS2015 catalogue for redshifts and stellar masses. We build two ML models: one containing deep photometry in the griz bands, and the second reflecting the photometric scatter present in the main DES survey, with carefully constructed representative training data in each case. We validate our joint PDFs for 10 699 test galaxies by utilizing the copula probability integral transform and the Kendall distribution function, and their univariate counterparts to validate the marginals. Benchmarked against a basic set-up of the template-fitting code bagpipes, our ML-based method outperforms template fitting on all of our predefined performance metrics. In addition to accuracy, the RF is extremely fast, able to compute joint PDFs for a million galaxies in just under 6 min with consumer computer hardware. Such speed enables PDFs to be derived in real time within analysis codes, solving potential storage issues. As part of this work we have developed galpro1, a highly intuitive and efficient python package to rapidly generate multivariate PDFs on-the-fly. galpro is documented and available for researchers to use in their cosmology and galaxy evolution studies.

D. J. James | B. Yanny | Francisco J. Castander | E. Bertin | D. W. Gerdes | K. Bechtol | P. Melchior | Alex Drlica-Wagner | H. T. Diehl | B. Flaugher | T. N. Varga | E. Gaztanaga | H. Lin | A. Amon | M. Aguena | F. Menanteau | Nikolay Kuropatkin | S. Mucesh | W. G. Hartley | A. Palmese | O. Lahav | L. Whiteway | G. M. Bernstein | A. CarneroRosell | M. CarrascoKind | A. Choi | K. Eckert | S. Everett | D. Gruen | R. A. Gruendl | I. Harrison | E. M. Huff | Ignacio Sevilla-Noarbe | Erin Sheldon | S. Allam | D. Bacon | S. Bhargava | D. Brooks | J. Carretero | C. Conselice | M. Costanzi | Martín Crocce | L. N. da Costa | M. E. S. Pereira | J. DeVicente | S. Desai | A. E. Evrard | I. Ferrero | Pablo Fosalba | J. Frieman | J. García-Bellido | J. Gschwend | G. Gutierrez | S. R. Hinton | D. L. Hollowood | K. Honscheid | K. Kuehn | M. Lima | M. A. G. Maia | R. Miquel | R. Morgan | F. Paz-Chinchón | A. A. Plazas | E. Sanchez | V. Scarpine | M. Schubnell | S. Serrano | M. Smith | E. Suchyta | G. Tarle | D. Thomas | C. To | R. D. Wilkinson | D. Gerdes | J. Frieman | O. Lahav | F. Castander | P. Fosalba | D. Bacon | C. Conselice | J. García-Bellido | A. Rosell | L. Costa | K. Honscheid | M. Maia | G. Bernstein | Peter Melchior | M. Kind | R. Gruendl | A. Palmese | W. Hartley | M. Pereira | S. Allam | J. DeVicente | H. Diehl | J. Gschwend | I. Sevilla-Noarbe | K. Bechtol | E. Bertin | D. Brooks | J. Carretero | M. Crocce | S. Desai | A. Drlica-Wagner | A. Evrard | B. Flaugher | E. Gaztañaga | D. Gruen | G. Gutiérrez | D. Hollowood | D. James | K. Kuehn | N. Kuropatkin | M. Lima | F. Menanteau | R. Miquel | A. Plazas | V. Scarpine | M. Schubnell | S. Serrano | M. Smith | E. Suchyta | G. Tarlé | E. Sheldon | B. Yanny | E. Sánchez | H. Lin | D. Thomas | L. Whiteway | M. Aguena | S. Hinton | E. Huff | F. Paz-Chinchón | R. Morgan | M. Costanzi | S. Everett | A. Choi | S. Bhargava | T. Varga | R. Wilkinson | I. Harrison | A. Amon | K. Eckert | I. Ferrero | C. To | S. Mucesh

[1]  H. Cramér On the composition of elementary errors , .

[2]  R. A. Leibler,et al.  On Information and Sufficiency , 1951 .

[3]  M. Rosenblatt Remarks on a Multivariate Transformation , 1952 .

[4]  F. Mosteller,et al.  Understanding robust and exploratory data analysis , 1985 .

[5]  Radiation flux enhancement and absorption in thin films , 1984 .

[6]  B. Schutz Determining the Hubble constant from gravitational wave observations , 1986, Nature.

[7]  A. N. Shiryayev,et al.  15. On The Empirical Determination of A Distribution Law , 1992 .

[8]  O. Lahav,et al.  Morphological Classification of galaxies by Artificial Neural Networks , 1992 .

[9]  N. Altman An Introduction to Kernel and Nearest-Neighbor Nonparametric Regression , 1992 .

[10]  S. Odewahn,et al.  Automated star/galaxy discrimination with neural networks , 1992 .

[11]  S. Charlot,et al.  Spectral evolution of stellar populations using isochrone synthesis , 1993 .

[12]  A. Szalay,et al.  Slicing Through Multicolor Space: Galaxy Redshifts from Broadband Photometry , 1995, astro-ph/9508100.

[13]  E. Bertin,et al.  SExtractor: Software for source extraction , 1996 .

[14]  N. Benı́tez Bayesian Photometric Redshift Estimation , 1998, astro-ph/9811189.

[15]  Wayne Hu,et al.  � 1999. The American Astronomical Society. All rights reserved. Printed in U.S.A. POWER SPECTRUM TOMOGRAPHY WITH WEAK LENSING , 1999 .

[16]  A. Kinney,et al.  The Dust Content and Opacity of Actively Star-forming Galaxies , 1999, astro-ph/9911459.

[17]  L. Moscardini,et al.  Measuring and modelling the redshift evolution of clustering: the Hubble Deep Field North , 1999, astro-ph/9902290.

[18]  Robert Lupton,et al.  A Modified Magnitude System that Produces Well-Behaved Magnitudes, Colors, and Errors Even for Low Signal-to-Noise Ratio Measurements , 1999, astro-ph/9903081.

[19]  T. Hamill Interpretation of Rank Histograms for Verifying Ensemble Forecasts , 2001 .

[20]  On the mass function of star clusters , 2002, astro-ph/0207514.

[21]  Mark Dickinson,et al.  Stellar Masses of High-Redshift Galaxies , 2003 .

[22]  Michigan.,et al.  Estimating photometric redshifts with artificial neural networks , 2002, astro-ph/0203250.

[23]  Ralf Bender,et al.  The mass of galaxies at low and high redshift : proceedings of the European Southern Observatory and Universitäts-Sternwarte München workshop held in Venice, Italy, 24-26 October 2001 , 2003 .

[24]  G. Bruzual,et al.  Stellar population synthesis at the resolution of 2003 , 2003, astro-ph/0309134.

[25]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[26]  Ofer Lahav,et al.  ANNz: Estimating Photometric Redshifts Using Artificial Neural Networks , 2004 .

[27]  J.Lee,et al.  THE DARK ENERGY CAMERA , 2004, The Dark Energy Survey.

[28]  Cheng Li,et al.  The cross-correlation between galaxies and groups: probing the galaxy distribution in and around dark matter haloes , 2005, astro-ph/0504477.

[29]  Y. Wadadekar Estimating Photometric Redshifts Using Support Vector Machines , 2004, astro-ph/0412005.

[30]  Spain.,et al.  Star formation and dust attenuation properties in galaxies from a statistical ultraviolet‐to‐far‐infrared analysis , 2005, astro-ph/0504434.

[31]  M. Way,et al.  Novel Methods for Predicting Photometric Redshifts from Broadband Photometry Using Virtual Sensors , 2006 .

[32]  L. Guzzo,et al.  The Cosmic Evolution Survey (COSMOS): Overview* , 2006, astro-ph/0612305.

[33]  Walter A. Siegmund,et al.  The 2.5 m Telescope of the Sloan Digital Sky Survey , 2006, astro-ph/0602326.

[34]  J. Skilling Nested sampling for general Bayesian computation , 2006 .

[35]  Carlos E. C. J. Gabriel,et al.  Astronomical Data Analysis Software and Systems Xv , 2022 .

[36]  G. Zamorani,et al.  The Zurich Extragalactic Bayesian Redshift Analyzer and its first application: COSMOS , 2006 .

[37]  Alvio Renzini Stellar Population Diagnostics of Elliptical Galaxy Formation , 2006 .

[38]  A. Raftery,et al.  Probabilistic forecasts, calibration and sharpness , 2007 .

[39]  Robert J. Brunner,et al.  Robust Machine Learning Applied to Astronomical Data Sets. II. Quantifying Photometric Redshifts for Quasars Using Instance-based Learning , 2006, astro-ph/0612471.

[40]  F. Feroz,et al.  Multimodal nested sampling: an efficient and robust alternative to Markov Chain Monte Carlo methods for astronomical data analyses , 2007, 0704.3704.

[41]  Paolo Coppi,et al.  EAZY: A Fast, Public Photometric Redshift Code , 2008, 0807.1533.

[42]  S. J. Lilly,et al.  Precision photometric redshift calibration for galaxy–galaxy weak lensing , 2007, 0709.1692.

[43]  V. Buat,et al.  Analysis of galaxy spectral energy distributions from far-UV to far-IR with CIGALE: studying a SINGS test sample , 2009, 0909.5439.

[44]  Donald W. Sweeney,et al.  LSST Science Book, Version 2.0 , 2009, 0912.0201.

[45]  F. Feroz,et al.  MultiNest: an efficient and robust Bayesian inference tool for cosmology and particle physics , 2008, 0809.3437.

[46]  Simon J. Lilly,et al.  Photo‐z performance for precision cosmology , 2009, 0910.5735.

[47]  Garth D. Illingworth,et al.  AN ULTRA-DEEP NEAR-INFRARED SPECTRUM OF A COMPACT QUIESCENT GALAXY AT z = 2.2 , 2009, 0905.1692.

[48]  Nicholas M. Ball,et al.  Incorporating photometric redshift probability density information into real-space clustering measurements , 2009, 0903.3121.

[49]  Alexander S. Szalay,et al.  RANDOM FORESTS FOR PHOTOMETRIC REDSHIFTS , 2010 .

[50]  Jonathan R Goodman,et al.  Ensemble samplers with affine invariance , 2010 .

[51]  Jiangang Hao,et al.  ArborZ: PHOTOMETRIC REDSHIFTS USING BOOSTED DECISION TREES , 2009, The Astrophysical Journal.

[52]  韩云坤 Decoding spectral energy distributions of dust-obscured starburst-AGN , 2011 .

[53]  James E. Geach,et al.  Unsupervised self-organized mapping: a versatile empirical tool for object selection, classification and redshift estimation in large surveys , 2011, 1110.0005.

[54]  R. Nichol,et al.  Euclid Definition Study Report , 2011, 1110.3193.

[55]  Daniel J. B. Smith,et al.  MAGPHYS: a publicly available tool to interpret observed galaxy SEDs , 2011, Proceedings of the International Astronomical Union.

[56]  B. Groves,et al.  Fitting the integrated spectral energy distributions of galaxies , 2010, 1008.0395.

[57]  Zhanwen Han,et al.  DECODING SPECTRAL ENERGY DISTRIBUTIONS OF DUST-OBSCURED STARBURST–ACTIVE GALACTIC NUCLEUS , 2012, 1202.6203.

[58]  M. J. Way,et al.  Can Self-Organizing Maps Accurately Predict Photometric Redshifts? , 2012 .

[59]  C. Conroy Modeling the Panchromatic Spectral Energy Distributions of Galaxies , 2013, 1301.7095.

[60]  R. J. Brunner,et al.  TPZ: photometric redshift PDFs and ancillary information by using prediction trees and random forests , 2013, 1303.7269.

[61]  Robert J. Brunner,et al.  SOMz: photometric redshift PDFs with self organizing maps and random atlas , 2013, ArXiv.

[62]  Daniel Foreman-Mackey,et al.  emcee: The MCMC Hammer , 2012, 1202.3665.

[63]  Tilmann Gneiting,et al.  Copula Calibration , 2013, 1307.7650.

[64]  Zhanwen Han,et al.  BayeSED: A GENERAL APPROACH TO FITTING THE SPECTRAL ENERGY DISTRIBUTION OF GALAXIES , 2014, 1408.6399.

[65]  M. Fairbairn,et al.  GAz: a genetic algorithm for photometric redshift estimation , 2014, 1412.5997.

[66]  A. Fontana,et al.  Deconstructing the Galaxy Stellar Mass Function with UKIDSS and CANDELS: The Impact of Colour, Structure and Environment , 2014, 1411.3339.

[67]  Stephen J. Roberts,et al.  A Sparse Gaussian Process Framework for Photometric Redshift Estimation , 2015, ArXiv.

[68]  C. Bonnett Using neural networks to estimate redshift distributions. An application to CFHTLenS , 2013, 1312.1287.

[69]  Eibe Frank,et al.  Accurate photometric redshift probability density estimation – method comparison and application , 2015, 1503.08215.

[70]  Robert Armstrong,et al.  GalSim: The modular galaxy image simulation toolkit , 2014, Astron. Comput..

[71]  Iftach Sadeh,et al.  ANNz2: Photometric Redshift and Probability Distribution Function Estimation using Machine Learning , 2015, 1507.00490.

[72]  Ben Hoyle,et al.  Measuring photometric redshifts using galaxy images and Deep Neural Networks , 2015, Astron. Comput..

[73]  C. B. D'Andrea,et al.  No Galaxy Left Behind: Accurate Measurements with the Faintest Objects in the Dark Energy Survey , 2015, 1507.08336.

[74]  Modelling and interpreting spectral energy distributions of galaxies with BEAGLE , 2016 .

[75]  Fabian Gieseke,et al.  Sacrificing information for the greater good: how to select photometric bands for optimal accuracy , 2015, Monthly Notices of the Royal Astronomical Society.

[76]  C. B. D'Andrea,et al.  Redshift distributions of galaxies in the Dark Energy Survey Science Verification shear catalogue and implications for weak lensing , 2015, Physical Review D.

[77]  O. Fèvre,et al.  THE COSMOS2015 CATALOG: EXPLORING THE 1 < z < 6 UNIVERSE WITH HALF A MILLION GALAXIES , 2016, 1604.02350.

[78]  Fabian Gieseke,et al.  Uncertain Photometric Redshifts , 2016 .

[79]  How to measure metallicity from five-band photometry with supervised machine learning algorithms , 2015, 1510.08076.

[80]  S. Charlot,et al.  Modelling and interpreting spectral energy distributions of galaxies with BEAGLE , 2016, 1603.03037.

[81]  D. Gerdes,et al.  Comparing Dark Energy Survey and HST–CLASH observations of the galaxy cluster RXC J2248.7−4431: implications for stellar mass versus dark matter , 2016, 1601.00589.

[82]  R. Nichol,et al.  The Dark Energy Survey: more than dark energy - an overview , 2016, 1601.00329.

[83]  Satoshi Miyazaki,et al.  Photometric Redshifts for Hyper Suprime-Cam Subaru Strategic Program Data Release 1 , 2017, 1704.05988.

[84]  D. W. Gerdes,et al.  Evolution of Galaxy Luminosity and Stellar-Mass Functions since $z=1$ with the Dark Energy Survey Science Verification Data , 2017 .

[85]  Kai Lars Polsterer,et al.  Photometric redshift estimation via deep learning , 2017, 1706.02467.

[86]  O. Ilbert,et al.  The many flavours of photometric redshifts , 2018, Nature Astronomy.

[87]  Zhanwen Han,et al.  A Comprehensive Bayesian Discrimination of the Simple Stellar Population Model, Star Formation History, and Dust Attenuation Law in the Spectral Energy Distribution Modeling of Galaxies , 2018, The Astrophysical Journal Supplement Series.

[88]  B. Yanny,et al.  Dark Energy Survey Year 1 Results: The Photometric Data Set for Cosmology , 2017, 1708.01531.

[89]  Emmanuel Bertin,et al.  Photometric redshifts from SDSS images using a convolutional neural network , 2018, Astronomy & Astrophysics.

[90]  Saso Dzeroski,et al.  Ensembles for multi-target regression with random output selections , 2018, Machine Learning.

[91]  J. Tinker,et al.  The Connection Between Galaxies and Their Dark Matter Halos , 2018, Annual Review of Astronomy and Astrophysics.

[92]  R. Davé,et al.  Inferring the star formation histories of massive quiescent galaxies with bagpipes: evidence for multiple quenching mechanisms , 2017, Monthly Notices of the Royal Astronomical Society.

[93]  N. Aghanim,et al.  Star formation rates and stellar masses from machine learning , 2019, Astronomy & Astrophysics.

[94]  Stephen Kent,et al.  Dark Energy Survey’s Observation Strategy, Tactics, and Exposure Scheduler , 2019, 1912.06254.

[95]  M. P. Hobson,et al.  Importance Nested Sampling and the MultiNest Algorithm , 2013, The Open Journal of Astrophysics.

[96]  G. Longo,et al.  Star formation rates for photometric samples of galaxies using machine learning methods , 2019, Monthly Notices of the Royal Astronomical Society.

[97]  N Tonello,et al.  The PAU Survey: early demonstration of photometric redshift performance in the COSMOS field , 2018, Monthly Notices of the Royal Astronomical Society.

[98]  D. Baron Machine Learning in Astronomy: a practical overview , 2019, 1904.07248.

[99]  B. A. Boom,et al.  First Measurement of the Hubble Constant from a Dark Standard Siren using the Dark Energy Survey Galaxies and the LIGO/Virgo Binary–Black-hole Merger GW170814 , 2019, The Astrophysical Journal.

[100]  Eleonora Di Valentino,et al.  Gravitational wave cosmology and astrophysics with large spectroscopic galaxy surveys , 2019, 1903.04730.

[101]  D. Corre,et al.  CIGALE: a python Code Investigating GALaxy Emission , 2018, Astronomy & Astrophysics.

[102]  J. Brinchmann,et al.  Euclid preparation , 2020, 2009.12112.

[103]  D. Gerdes,et al.  A Statistical Standard Siren Measurement of the Hubble Constant from the LIGO/Virgo Gravitational Wave Compact Object Merger GW190814 and Dark Energy Survey Galaxies , 2020, The Astrophysical Journal.

[104]  The LSST Dark Energy Science Collaboration,et al.  Evaluation of probabilistic photometric redshift estimation approaches for LSST , 2020 .

[105]  D. Gerdes,et al.  Stellar mass as a galaxy cluster mass proxy: application to the Dark Energy Survey redMaPPer clusters , 2019, Monthly Notices of the Royal Astronomical Society.

[106]  Il,et al.  Dark Energy Survey Year 3 Results: Deep Field Optical + Near-Infrared Images and Catalogue , 2020, 2012.12824.

[107]  D. Gerdes,et al.  Dark Energy Survey Year 3 Results: Photometric Data Set for Cosmology , 2020, Astrophysical Journal Supplement Series.

[108]  Tucson,et al.  Dark Energy Survey Year 3 Results: Measuring the Survey Transfer Function with Balrog , 2022, The Astrophysical Journal Supplement Series.