Evaluating Sensitivity to the Stick-Breaking Prior in Bayesian Nonparametrics

A central question in many probabilistic clustering problems is how many distinct clusters are present in a particular dataset. A Bayesian nonparametric (BNP) model addresses this question by placing a generative process on cluster assignment. However, like all Bayesian approaches, BNP requires the specification of a prior. In practice, it is important to quantitatively establish that the prior is not too informative, particularly when the particular form of the prior is chosen for mathematical convenience rather than because of a considered subjective belief. We derive local sensitivity measures for a truncated variational Bayes (VB) approximation and approximate nonlinear dependence of a VB optimum on prior parameters using a local Taylor series approximation. Using a stick-breaking representation of a Dirichlet process, we consider perturbations both to the scalar concentration parameter and to the functional form of the stick- breaking distribution. Unlike previous work on local Bayesian sensitivity for BNP, we pay special attention to the ability of our sensitivity measures to extrapolate to different priors, rather than treating the sensitivity as a measure of robustness per se. Extrapolation motivates the use of multiplicative perturbations to the functional form of the prior for VB. Additionally, we linearly approximate only the computationally intensive part of inference -- the optimization of the global parameters -- and retain the nonlinearity of easily computed quantities as functions of the global parameters. We apply our methods to estimate sensitivity of the expected number of distinct clusters present in the Iris dataset to the BNP prior specification. We evaluate the accuracy of our approximations by comparing to the much more expensive process of re-fitting the model.

[1]  Joel Nothman,et al.  SciPy 1.0-Fundamental Algorithms for Scientific Computing in Python , 2019, ArXiv.

[2]  Quantification of the impact of priors in Bayesian statistics via Stein’s Method , 2019, Statistics & Probability Letters.

[3]  S. Kurtek,et al.  Geometric Sensitivity Measures for Bayesian Nonparametric Density Estimation Models , 2018, Sankhya A.

[4]  Tamara Broderick,et al.  Truncated random measures , 2016, Bernoulli.

[5]  Luca Ambrogioni,et al.  Wasserstein Variational Inference , 2018, NeurIPS.

[6]  Michael I. Jordan,et al.  Covariances, Robustness, and Variational Bayes , 2017, J. Mach. Learn. Res..

[7]  A. Lijoi,et al.  On the Pitman–Yor process with spike and slab base measure , 2017 .

[8]  Dustin Tran,et al.  Automatic Differentiation Variational Inference , 2016, J. Mach. Learn. Res..

[9]  Barak A. Pearlmutter,et al.  Automatic differentiation in machine learning: a survey , 2015, J. Mach. Learn. Res..

[10]  Dilin Wang,et al.  Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm , 2016, NIPS.

[11]  Richard E. Turner,et al.  Variational Inference with Rényi Divergence , 2016, ArXiv.

[12]  David M. Blei,et al.  Variational Inference: A Review for Statisticians , 2016, ArXiv.

[13]  Andrew Gelman,et al.  Automatic Variational Inference in Stan , 2015, NIPS.

[14]  Yukiko Matsuoka,et al.  An Ultrasensitive Mechanism Regulates Influenza Virus-Induced Inflammation , 2015, PLoS pathogens.

[15]  Brian Kulis,et al.  Gamma Processes, Stick-Breaking, and Variational Inference , 2015, AISTATS.

[16]  M. Stephens,et al.  fastSTRUCTURE: Variational Inference of Population Structure in Large SNP Data Sets , 2014, Genetics.

[17]  Sean Gerrish,et al.  Black Box Variational Inference , 2013, AISTATS.

[18]  Matthew T. Harrison,et al.  Inconsistency of Pitman-Yor process mixtures for the number of components , 2013, J. Mach. Learn. Res..

[19]  Leonhard Held,et al.  Sensitivity analysis for Bayesian hierarchical models , 2013, 1312.4797.

[20]  A. Lijoi,et al.  Modeling with normalized random measure mixture models , 2013, 1310.0260.

[21]  Matthew T. Harrison,et al.  A simple example of Dirichlet process mixture inconsistency for the number of components , 2013, NIPS.

[22]  David B. Dunson,et al.  Bayesian data analysis, third edition , 2013 .

[23]  J. Norris Appendix: probability and measure , 1997 .

[24]  Emin Orhan Dirichlet Processes , 2012 .

[25]  Yee Whye Teh,et al.  Variational Inference for the Indian Buffet Process , 2009, AISTATS.

[26]  Luis E. Nieto-Barajas,et al.  A sensitivity analysis for Bayesian nonparametric density estimators , 2009 .

[27]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[28]  Ramsés H. Mena,et al.  Controlling the reinforcement in Bayesian non‐parametric mixture models , 2007 .

[29]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[30]  Michael I. Jordan,et al.  Hierarchical Dirichlet Processes , 2006 .

[31]  Michael I. Jordan,et al.  Variational inference for Dirichlet process mixtures , 2006 .

[32]  John D. Storey,et al.  Significance analysis of time course microarray experiments. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[33]  Ajay Jasra,et al.  Markov Chain Monte Carlo Methods and the Label Switching Problem in Bayesian Mixture Modeling , 2005 .

[34]  L. Lens,et al.  Genetic variability and gene flow in the globally, critically-endangered Taita thrush , 2000, Conservation Genetics.

[35]  Hongzhe Li,et al.  Clustering of time-course gene expression data using a mixed-effects model with B-splines , 2003, Bioinform..

[36]  D K Smith,et al.  Numerical Optimization , 2001, J. Oper. Res. Soc..

[37]  P. Donnelly,et al.  Inference of population structure using multilocus genotype data. , 2000, Genetics.

[38]  David Ríos Insua,et al.  Robust Bayesian analysis , 2000 .

[39]  Sanjib Basu,et al.  Bayesian Robustness and Bayesian Nonparametrics , 2000 .

[40]  Paul Gustafson,et al.  Local Robustness in Bayesian Analysis , 2000 .

[41]  Siva Sivaganesan,et al.  Global and Local Robustness Approaches: Uses and Limitations , 2000 .

[42]  Ole A. Nielsen An Introduction to Integration and Measure Theory , 1997 .

[43]  P. Gustafson Local Sensitivity of Inferences to Prior Marginals , 1996 .

[44]  P. Gustafson Local sensitivity of posterior expectations , 1996 .

[45]  S. R. Jammalamadaka,et al.  Local Posterior Robustness with Parametric Priors: Maximum and Average Sensitivity , 1996 .

[46]  J. Sethuraman A CONSTRUCTIVE DEFINITION OF DIRICHLET PRIORS , 1991 .

[47]  Nonlinear functional analysis and its applications, part I: Fixed-point theorems , 1991 .

[48]  Dudley,et al.  Real Analysis and Probability: Measurability: Borel Isomorphism and Analytic Sets , 2002 .

[49]  Griewank,et al.  On automatic differentiation , 1988 .

[50]  John Law,et al.  Robust Statistics—The Approach Based on Influence Functions , 1986 .

[51]  R. Cook Assessment of Local Influence , 1986 .

[52]  Werner A. Stahel,et al.  Robust Statistics: The Approach Based on Influence Functions , 1987 .

[53]  D. Blackwell,et al.  Ferguson Distributions Via Polya Urn Schemes , 1973 .

[54]  T. Ferguson A Bayesian Analysis of Some Nonparametric Problems , 1973 .

[55]  O. Smolyanov,et al.  The theory of differentiation in linear topological spaces , 1967 .

[56]  J. McCloskey,et al.  A model for the distribution of individuals by species in an environment , 1965 .

[57]  G. Moment A theory of differentiation. , 1952, Growth.

[58]  E. Anderson The Species Problem in Iris , 1936 .

[59]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .