Collaborative Nested Sampling: Big Data versus Complex Physical Models

The data torrent unleashed by current and upcoming astronomical surveys demands scalable analysis methods. Many machine learning approaches scale well, but separating the instrument measurement from the physical effects of interest, dealing with variable errors, and deriving parameter uncertainties is often an after-thought. Classic forward-folding analyses with Markov Chain Monte Carlo or Nested Sampling enable parameter estimation and model comparison, even for complex and slow-to-evaluate physical models. However, these approaches require independent runs for each data set, implying an unfeasible number of model evaluations in the Big Data regime. Here I present a new algorithm, collaborative nested sampling, for deriving parameter probability distributions for each observation. Importantly, the number of physical model evaluations scales sub-linearly with the number of data sets, and no assumptions about homogeneous errors, Gaussianity, the form of the model or heterogeneity/completeness of the observations need to be made. Collaborative nested sampling has immediate application in speeding up analyses of large surveys, integral-field-unit observations, and Monte Carlo simulations.

[1]  Aki Vehtari,et al.  Validating Bayesian Inference Algorithms with Simulation-Based Calibration , 2018, 1804.06788.

[2]  Sergey Sazonov,et al.  AGN and QSOs in the eROSITA all-sky survey - I. Statistical properties , 2012, 1212.2151.

[3]  R. Pello,et al.  Kinematics, turbulence, and star formation of z ∼ 1 strongly lensed galaxies seen with MUSE , 2018, 1802.08451.

[4]  G. Bruzual,et al.  Stellar population synthesis at the resolution of 2003 , 2003, astro-ph/0309134.

[5]  A. Kinney,et al.  The Dust Content and Opacity of Actively Star-forming Galaxies , 1999, astro-ph/9911459.

[6]  M. Hobson,et al.  Efficient Bayesian inference for multimodal problems in cosmology , 2007, astro-ph/0701867.

[7]  F. Feroz,et al.  MultiNest: an efficient and robust Bayesian inference tool for cosmology and particle physics , 2008, 0809.3437.

[8]  Christian Igel,et al.  Big Universe, Big Data: Machine Learning and Image Analysis for Astronomy , 2017, IEEE Intelligent Systems.

[9]  A. Lasenby,et al.  polychord: next-generation nested sampling , 2015, 1506.00171.

[10]  Yanxia Zhang,et al.  Astronomy in the Big Data Era , 2015, Data Sci. J..

[11]  L. Wisotzki,et al.  Lens modelling Abell 370: crowning the final frontier field with MUSE , 2016, 1611.01513.

[12]  John D. Hunter,et al.  Matplotlib: A 2D Graphics Environment , 2007, Computing in Science & Engineering.

[13]  M. Loupias,et al.  The MUSE second-generation VLT instrument , 2010, Astronomical Telescopes + Instrumentation.

[14]  L. Moscardini,et al.  Measuring and modelling the redshift evolution of clustering: the Hubble Deep Field North , 1999, astro-ph/9902290.

[15]  P. Predehl eROSITA on SRG , 2017 .

[16]  Simon J. Lilly,et al.  ZAP -- Enhanced PCA Sky Subtraction for Integral Field Spectroscopy , 2016, 1602.08037.

[17]  A. Mickaelian Astronomical Surveys and Big Data , 2015, 1511.07322.

[18]  G. Jogesh Babu,et al.  Big data in astronomy , 2012 .

[19]  Johannes Buchner,et al.  A statistical test for Nested Sampling algorithms , 2014, Statistics and Computing.

[20]  J. Skilling Nested Sampling’s Convergence , 2009 .