On the Local Sensitivity of M-Estimation: Bayesian and Frequentist Applications

Author(s): Giordano, Ryan James | Advisor(s): Jordan, Michael I; McAuliffe, Jon | Abstract: This thesis uses the local sensitivity of M-estimators to address a number ofextant problems in Bayesian and frequentist statistics. First, by exploiting aduality from the Bayesian robustness literature between sensitivity andcovariances, I provide significantly improved covariance estimates for meanfield variational Bayes (MFVB) procedures at little extra computational cost.Prior to this work, applications of MFVB have arguably been limited toprediction problems rather than inference problems for lack of reliableuncertainty measures. Second, I provide practical finite-sample accuracy boundsfor the ``infinitesimal jackknife'' (IJ), a classical measure of localsensitivity to an empirical process. In doing so, I bridge a gap betweenclassical IJ theory and recent machine learning practice, showing that stringentclassical conditions for the consistency of the IJ can be relaxed for restrictedbut useful classes of weight vectors, such as those of leave-K-out crossvalidation. Finally, I provide techniques to quantify the sensitivity of theinferred number of clusters in Bayesian nonparametric (BNP) unsupervisedclustering problems to the form of the Dirichlet process prior. By consideringlocal sensitivity to be an approximation to global sensitivity rather than ameasure of robustness per se, I provide tools with considerablyimproved ability to extrapolate to different priors. Because each of thesediverse applications are based on the same formal technique---the Taylor seriesexpansion of an M-estimator---this work captures in a unified way thecomputational difficulties associated with each, and I provide open-source toolsin Python and R to assist in their computation.

[1]  Eric Jones,et al.  SciPy: Open Source Scientific Tools for Python , 2001 .

[2]  Samuel J. Gershman,et al.  A Tutorial on Bayesian Nonparametric Models , 2011, 1106.2697.

[3]  R. Keener Theoretical Statistics: Topics for a Core Course , 2010 .

[4]  David M. Blei,et al.  Variational Inference: A Review for Statisticians , 2016, ArXiv.

[5]  Chong Wang,et al.  Stochastic variational inference , 2012, J. Mach. Learn. Res..

[6]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[7]  E. Anderson The Species Problem in Iris , 1936 .

[8]  D. Freedman,et al.  On the consistency of Bayes estimates , 1986 .

[9]  David Ríos Insua,et al.  Topics on the Foundations of Robust Bayesian Analysis , 2000 .

[10]  J. McCloskey,et al.  A model for the distribution of individuals by species in an environment , 1965 .

[11]  Dustin Tran,et al.  Automatic Differentiation Variational Inference , 2016, J. Mach. Learn. Res..

[12]  Yukiko Matsuoka,et al.  An Ultrasensitive Mechanism Regulates Influenza Virus-Induced Inflammation , 2015, PLoS pathogens.

[13]  Dilin Wang,et al.  Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm , 2016, NIPS.

[14]  B. Efron Frequentist accuracy of Bayesian estimates , 2015, Journal of the Royal Statistical Society. Series B, Statistical methodology.

[15]  Dustin Tran,et al.  Hierarchical Variational Models , 2015, ICML.

[16]  John D. Storey,et al.  Significance analysis of time course microarray experiments. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[17]  Michael I. Jordan,et al.  Covariances, Robustness, and Variational Bayes , 2017, J. Mach. Learn. Res..

[18]  Geoffrey E. Hinton,et al.  A View of the Em Algorithm that Justifies Incremental, Sparse, and other Variants , 1998, Learning in Graphical Models.

[19]  W. Fleming Functions of Several Variables , 1965 .

[20]  Miguel Lázaro-Gredilla,et al.  Doubly Stochastic Variational Bayes for non-Conjugate Inference , 2014, ICML.

[21]  Dustin Tran,et al.  Variational Gaussian Process , 2015, ICLR.

[22]  Toshiyuki TANAKA Mean-field theory of Boltzmann machine learning , 1998 .

[23]  Edoardo M. Airoldi,et al.  Copula variational inference , 2015, NIPS.

[24]  Emanuele Borgonovo,et al.  Sensitivity analysis: an introduction for the management scientist , 2017 .

[25]  G. McLachlan,et al.  Extensions of the EM Algorithm , 2007 .

[26]  Jarrod Had MCMC Methods for Multi-Response Generalized Linear Mixed Models: The MCMCglmm R Package , 2010 .

[27]  David Ríos Insua,et al.  Robust Bayesian analysis , 2000 .

[28]  E. Zeidler Nonlinear functional analysis and its applications , 1988 .

[29]  B. Efron The jackknife, the bootstrap, and other resampling plans , 1987 .

[30]  Matthew J. Beal,et al.  The variational Bayesian EM algorithm for incomplete data: with application to scoring graphical model structures , 2003 .

[31]  James A. Reeds,et al.  Jackknifing Maximum Likelihood Estimates , 1978 .

[32]  Frederick R. Forst,et al.  On robust estimation of the location parameter , 1980 .

[33]  Andrew Gelman,et al.  Automatic Variational Inference in Stan , 2015, NIPS.

[34]  Richard E. Turner,et al.  Two problems with variational expectation maximisation for time-series models , 2011 .

[35]  David J. C. MacKay,et al.  Information Theory, Inference, and Learning Algorithms , 2004, IEEE Transactions on Information Theory.

[36]  James R. Schott,et al.  Matrix Analysis for Statistics , 2005 .

[37]  Andrew Gelman,et al.  Data Analysis Using Regression and Multilevel/Hierarchical Models , 2006 .

[38]  Michael Schaub,et al.  Bayesian Population Analysis using WinBUGS: A Hierarchical Perspective , 2011 .

[39]  Ya Zhang,et al.  Clustering of Time-Course Gene Expression Data , 2004 .

[40]  L. Fernholz von Mises Calculus For Statistical Functionals , 1983 .

[41]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[42]  Carlos J. Perez,et al.  MCMC-based local parametric sensitivity estimations , 2006, Comput. Stat. Data Anal..

[43]  Sanjib Basu,et al.  Bayesian Robustness and Bayesian Nonparametrics , 2000 .

[44]  Ole Winther,et al.  Mean-Field Approaches to Independent Component Analysis , 2002, Neural Computation.

[45]  Paul Gustafson,et al.  Local Robustness in Bayesian Analysis , 2000 .

[46]  Michael I. Jordan,et al.  Linear Response Methods for Accurate Covariance Estimates from Mean Field Variational Bayes , 2015, NIPS.

[47]  David M. Blei,et al.  Frequentist Consistency of Variational Bayes , 2017, Journal of the American Statistical Association.

[48]  E. Moreno Global Bayesian Robustness for Some Classes of Prior Distributions , 2000 .

[49]  Ole Winther,et al.  Variational Linear Response , 2003, NIPS.

[50]  Bo Wang,et al.  Inadequacy of interval estimates corresponding to variational Bayesian approximations , 2005, AISTATS.

[51]  B. H.J.KappentandF. Efficient learning in Boltzmann Machines using linear response theory* , 2018 .

[52]  Daan Wierstra,et al.  Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[53]  Lorenz T. Biegler,et al.  On the implementation of an interior-point filter line-search algorithm for large-scale nonlinear programming , 2006, Math. Program..

[54]  Leonhard Held,et al.  Sensitivity analysis for Bayesian hierarchical models , 2013, 1312.4797.

[55]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[56]  P. Gustafson Local Sensitivity of Inferences to Prior Marginals , 1996 .

[57]  H. Rue,et al.  Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations , 2009 .

[58]  Percy Liang,et al.  Understanding Black-box Predictions via Influence Functions , 2017, ICML.

[59]  Kamiar Rahnama Rad,et al.  A scalable estimate of the extra-sample prediction error via approximate leave-one-out , 2018, 1801.10243.

[60]  R. V. Mises On the Asymptotic Distribution of Differentiable Statistical Functions , 1947 .

[61]  P. Hall The Bootstrap and Edgeworth Expansion , 1992 .

[62]  Bob Carpenter,et al.  The Stan Math Library: Reverse-Mode Automatic Differentiation in C++ , 2015, ArXiv.

[63]  M. Plummer,et al.  CODA: convergence diagnosis and output analysis for MCMC , 2006 .

[64]  Michael I. Jordan,et al.  Variational inference for Dirichlet process mixtures , 2006 .

[65]  J. Ibrahim,et al.  Bayesian influence analysis: a geometric approach. , 2011, Biometrika.

[66]  Iain Dunning,et al.  Computing in Operations Research Using Julia , 2013, INFORMS J. Comput..

[67]  Jun Shao,et al.  Differentiability of Statistical Functionals and Consistency of the Jackknife , 1993 .

[68]  J. Sethuraman A CONSTRUCTIVE DEFINITION OF DIRICHLET PRIORS , 1991 .

[69]  U. Faber Asymptotics In Statistics Some Basic Concepts , 2016 .

[70]  A. Rukhin Bayes and Empirical Bayes Methods for Data Analysis , 1997 .

[71]  Shakir Mohamed,et al.  Variational Inference with Normalizing Flows , 2015, ICML.

[72]  Yee Whye Teh,et al.  Linear Response Algorithms for Approximate Inference in Graphical Models , 2004, Neural Computation.

[73]  Saharon Rosset,et al.  From Fixed-X to Random-X Regression: Bias-Variance Decompositions, Covariance Penalties, and Prediction Error Estimation , 2017, Journal of the American Statistical Association.

[74]  T. Ferguson A Bayesian Analysis of Some Nonparametric Problems , 1973 .

[75]  Barak A. Pearlmutter,et al.  Automatic differentiation in machine learning: a survey , 2015, J. Mach. Learn. Res..

[76]  B. R. Clarke Uniqueness and Fréchet differentiability of functional solutions to maximum likelihood type equations , 1983 .

[77]  R. Cook Assessment of Local Influence , 1986 .

[78]  M. Opper,et al.  Information Geometry of Mean Field Approximation , 2001 .

[79]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[80]  Vahid Tarokh,et al.  On Optimal Generalizability in Parametric Learning , 2017, NIPS.

[81]  G. Parisi,et al.  Statistical Field Theory , 1988 .

[82]  Michael I. Jordan,et al.  Fast robustness quantification with variational Bayes , 2016, 1606.07153.

[83]  S. R. Jammalamadaka,et al.  Local Posterior Robustness with Parametric Priors: Maximum and Average Sensitivity , 1996 .

[84]  Naman Agarwal,et al.  Second Order Stochastic Optimization in Linear Time , 2016, ArXiv.

[85]  Prabhat,et al.  Celeste: Variational inference for a generative model of astronomical images , 2015, ICML.

[86]  J. Shao,et al.  The jackknife and bootstrap , 1996 .

[87]  J. Ibrahim,et al.  Perturbation selection and influence measures in local influence analysis , 2007, 0803.2986.

[88]  P. Gustafson Local sensitivity of posterior expectations , 1996 .

[89]  M. Opper,et al.  Advanced mean field methods: theory and practice , 2001 .

[90]  Xiao-Li Meng,et al.  Using EM to Obtain Asymptotic Variance-Covariance Matrices: The SEM Algorithm , 1991 .

[91]  Trevor J. Hastie,et al.  Confidence intervals for random forests: the jackknife and the infinitesimal jackknife , 2013, J. Mach. Learn. Res..

[92]  C. Mallows,et al.  A Method for Comparing Two Hierarchical Clusterings , 1983 .

[93]  Douglas M. Bates,et al.  Fast and Elegant Numerical Linear Algebra Using the RcppEigen Package , 2013 .

[94]  Dorota Kurowicka,et al.  Generating random correlation matrices based on vines and extended onion method , 2009, J. Multivar. Anal..

[95]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..