Approximating Likelihood Ratios with Calibrated Discriminative Classifiers

In many fields of science, generalized likelihood ratio tests are established tools for statistical inference. At the same time, it has become increasingly common that a simulator (or generative model) is used to describe complex processes that tie parameters $\theta$ of an underlying theory and measurement apparatus to high-dimensional observations $\mathbf{x}\in \mathbb{R}^p$. However, simulator often do not provide a way to evaluate the likelihood function for a given observation $\mathbf{x}$, which motivates a new class of likelihood-free inference algorithms. In this paper, we show that likelihood ratios are invariant under a specific class of dimensionality reduction maps $\mathbb{R}^p \mapsto \mathbb{R}$. As a direct consequence, we show that discriminative classifiers can be used to approximate the generalized likelihood ratio statistic when only a generative model for the data is available. This leads to a new machine learning-based approach to likelihood-free inference that is complementary to Approximate Bayesian Computation, and which does not require a prior on the model parameters. Experimental results on artificial problems with known exact likelihoods illustrate the potential of the proposed method.

[1]  L. Hörmander The analysis of linear partial differential operators , 1990 .

[2]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[3]  A. Read Linear interpolation of histograms , 1999 .

[4]  H. Shimodaira,et al.  Improving predictive inference under covariate shift by weighting the log-likelihood function , 2000 .

[5]  J. Friedman Special Invited Paper-Additive logistic regression: A statistical view of boosting , 2000 .

[6]  C. Lester,et al.  Measuring sparticle masses in non-universal string inspired models at the LHC , 2000, hep-ph/0007009.

[7]  S. Mrenna,et al.  Pythia 6.3 physics and manual , 2003, hep-ph/0308153.

[8]  D. Balding,et al.  Approximate Bayesian computation in population genetics. , 2002, Genetics.

[9]  John W. Fisher,et al.  Nonparametric hypothesis tests for statistical dependency , 2004, IEEE Transactions on Signal Processing.

[10]  Yi Lin,et al.  Support Vector Machines and the Bayes Rule in Classification , 2002, Data Mining and Knowledge Discovery.

[11]  Masashi Sugiyama,et al.  Input-dependent estimation of generalization error under covariate shift , 2005 .

[12]  Robert D. Nowak,et al.  A Neyman-Pearson approach to statistical learning , 2005, IEEE Transactions on Information Theory.

[13]  Klaus-Robert Müller,et al.  Covariate Shift Adaptation by Importance Weighted Cross Validation , 2007, J. Mach. Learn. Res..

[14]  Radford M Neal Computing Likelihood Functions for High-Energy Physics Experiments when Distributions are Defined by Simulators with Nuisance Parameters , 2008 .

[15]  Steffen Bickel,et al.  Discriminative Learning Under Covariate Shift , 2009, J. Mach. Learn. Res..

[16]  Karsten M. Borgwardt,et al.  Covariate Shift by Kernel Mean Matching , 2009, NIPS 2009.

[17]  Takafumi Kanamori,et al.  Statistical outlier detection using direct density ratio estimation , 2011, Knowledge and Information Systems.

[18]  D. Whiteson,et al.  Top quark mass measurement in the lepton + jets channel using a matrix element method and in situ jet energy calibration. , 2010, Physical review letters.

[19]  K. Cranmer,et al.  Asymptotic formulae for likelihood-based tests of new physics , 2010, 1007.1727.

[20]  Nando de Freitas,et al.  A Tutorial on Bayesian Optimization of Expensive Cost Functions, with Application to Active User Modeling and Hierarchical Reinforcement Learning , 2010, ArXiv.

[21]  Martin J. Wainwright,et al.  Estimating Divergence Functionals and the Likelihood Ratio by Convex Risk Minimization , 2008, IEEE Transactions on Information Theory.

[22]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[23]  I. Volobouev,et al.  Matrix Element Method in HEP: Transfer Functions, Efficiencies, and Likelihood Normalization , 2011, 1101.2259.

[24]  K. Cranmer,et al.  HistFactory: A tool for creating statistical models for use with RooFit and RooStats , 2012 .

[25]  Ryszard S. Romaniuk,et al.  Observation of a new boson at a mass of 125 GeV with the CMS experiment at the LHC , 2012 .

[26]  Takafumi Kanamori,et al.  Density Ratio Estimation in Machine Learning , 2012 .

[27]  Motoaki Kawanabe,et al.  Machine Learning in Non-Stationary Environments - Introduction to Covariate Shift Adaptation , 2012, Adaptive computation and machine learning.

[28]  The Cms Collaboration Observation of a new boson at a mass of 125 GeV with the CMS experiment at the LHC , 2012, 1207.7235.

[29]  V. Vapnik,et al.  Constructive Setting of the Density Ratio Estimation Problem and its Rigorous Solution , 2013, 1306.0407.

[30]  Xin Tong,et al.  A plug-in approach to neyman-pearson classification , 2013, J. Mach. Learn. Res..

[31]  8D likelihood effective Higgs couplings extraction framework in h → 4ℓ , 2014, 1401.2077.

[32]  M. Baak,et al.  Interpolation between multi-dimensional histograms using a new non-linear moment morphing method , 2014, 1410.7388.

[33]  Pierre Baldi,et al.  Parameterized Machine Learning for High-Energy Physics , 2016, ArXiv.

[34]  Gilles Louppe,et al.  Carl: a Likelihood-free Inference Toolbox , 2016, J. Open Source Softw..