An Introduction to Bayesian Inference via Variational Approximations

Markov chain Monte Carlo (MCMC) methods have facilitated an explosion of interest in Bayesian methods. MCMC is an incredibly useful and important tool but can face difficulties when used to estimate complex posteriors or models applied to large data sets. In this paper, we show how a recently developed tool in computer science for fitting Bayesian models, variational approximations, can be used to facilitate the application of Bayesian models to political science data. Variational approximations are often much faster than MCMC for fully Bayesian inference and in some instances facilitate the estimation of models that would be otherwise impossible to estimate. As a deterministic posterior approximation method, variational approximations are guaranteed to converge and convergence is easily assessed. But variational approximations do have some limitations, which we detail below. Therefore, variational approximations are best suited to problems when fully Bayesian inference would otherwise be impossible. Through a series of examples, we demonstrate how variational approximations are useful for a variety of political science research. This includes models to describe legislative voting blocs and statistical models for political texts. The code that implements the models in this paper is available in the supplementary material.

[1]  T. Ferguson A Bayesian Analysis of Some Nonparametric Problems , 1973 .

[2]  C. Antoniak Mixtures of Dirichlet Processes with Applications to Bayesian Nonparametric Problems , 1974 .

[3]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[4]  Richard F. Fenno Home Style : House Members in Their Districts , 1978 .

[5]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[6]  Donald Geman,et al.  Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  G. King,et al.  Unifying Political Methodology: The Likelihood Theory of Statistical Inference , 1989 .

[8]  Adrian F. M. Smith,et al.  Sampling-Based Approaches to Calculating Marginal Densities , 1990 .

[9]  J. Sethuraman A CONSTRUCTIVE DEFINITION OF DIRICHLET PRIORS , 1991 .

[10]  D. Rubin,et al.  Inference from Iterative Simulation Using Multiple Sequences , 1992 .

[11]  S. Chib,et al.  Bayesian analysis of binary and polychotomous response data , 1993 .

[12]  Simon Jackman,et al.  Bayesian Inference for Comparative Research , 1994, American Political Science Review.

[13]  Bradley P. Carlin,et al.  Markov Chain Monte Carlo conver-gence diagnostics: a comparative review , 1996 .

[14]  M. Escobar,et al.  Bayesian Density Estimation and Inference Using Mixtures , 1995 .

[15]  L. Wasserman,et al.  Computing Bayes Factors by Combining Simulation and Asymptotic Approximations , 1997 .

[16]  A. Raftery,et al.  A note on the Dirichlet process prior in Bayesian nonparametric inference with partial exchangeability , 1997 .

[17]  M. F. Porter,et al.  An algorithm for suffix stripping , 1997 .

[18]  John Londregan,et al.  Estimating Legislators' Preferred Points , 1999, Political Analysis.

[19]  Charles M. Bishop Variational principal components , 1999 .

[20]  Simon Jackman,et al.  Estimation and Inference via Bayesian Simulation: An Introduction to Markov Chain Monte Carlo , 2000 .

[21]  Radford M. Neal Markov Chain Sampling Methods for Dirichlet Process Mixture Models , 2000 .

[22]  Zoubin Ghahramani,et al.  Propagation Algorithms for Variational Bayesian Learning , 2000, NIPS.

[23]  C. Robert,et al.  Computational and Inferential Difficulties with Mixture Posterior Distributions , 2000 .

[24]  Geoffrey J. McLachlan,et al.  Finite Mixture Models , 2019, Annual Review of Statistics and Its Application.

[25]  Mario Medvedovic,et al.  Bayesian infinite mixture model based clustering of gene expression profiles , 2002, Bioinform..

[26]  Alan E Gelfand,et al.  A Nonparametric Bayesian Modeling Approach for Cytogenetic Dosimetry , 2002, Biometrics.

[27]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[28]  Bo Wang,et al.  Convergence and Asymptotic Normality of Variational Bayesian Approximations for Expon , 2004, UAI.

[29]  Michael I. Jordan,et al.  An Introduction to Variational Methods for Graphical Models , 1999, Machine Learning.

[30]  Peter D. Hoff,et al.  Modeling Dependencies in International Relations Networks , 2004, Political Analysis.

[31]  Joshua D. Clinton,et al.  The Statistical Analysis of Roll Call Data , 2004, American Political Science Review.

[32]  George Casella,et al.  Dynamic Tempered Transitions for Exploring Multimodal Posterior Distributions , 2004, Political Analysis.

[33]  David J. C. MacKay,et al.  Information Theory, Inference, and Learning Algorithms , 2004, IEEE Transactions on Information Theory.

[34]  Ajay Jasra,et al.  Markov Chain Monte Carlo Methods and the Label Switching Problem in Bayesian Mixture Modeling , 2005 .

[35]  Jeff Gill,et al.  Elicited Priors for Bayesian Model Specifications in Political Science Research , 2005 .

[36]  Yee Whye Teh,et al.  A Collapsed Variational Bayesian Inference Algorithm for Latent Dirichlet Allocation , 2006, NIPS.

[37]  John D. Lafferty,et al.  Dynamic topic models , 2006, ICML.

[38]  Michael I. Jordan,et al.  Hierarchical Dirichlet Processes , 2006 .

[39]  Michael I. Jordan,et al.  Variational inference for Dirichlet process mixtures , 2006 .

[40]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[41]  J. Gill Is Partial-Dimension Convergence a Problem for Inferences from MCMC Algorithms? , 2007, Political Analysis.

[42]  Gary King,et al.  Extracting Systematic Social Science Meaning from Text 1 , 2007 .

[43]  Introduction to Variational Methods , 2008 .

[44]  Stephen Ansolabehere,et al.  The Strength of Issues: Using Multiple Measures to Gauge Preference Stability, Ideological Constraint, and Issue Voting , 2008, American Political Science Review.

[45]  Simon Jackman,et al.  Democracy as a Latent Variable , 2008 .

[46]  P. Deb Finite Mixture Models , 2008 .

[47]  K. Quinn,et al.  Identifying Intra-Party Voting Blocs in UK House of Commons , 2009 .

[48]  Jeffrey R. Lax,et al.  Gay Rights in the States: Public Opinion and Policy Responsiveness , 2009, American Political Science Review.

[49]  G. Casella,et al.  Nonparametric Priors for Ordinal Bayesian Social Science Models: Specification and Estimation , 2009 .

[50]  Dragomir R. Radev,et al.  How to Analyze Political Attention with Minimal Assumptions and Costs , 2010 .

[51]  John K Kruschke,et al.  Bayesian data analysis. , 2010, Wiley interdisciplinary reviews. Cognitive science.

[52]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[53]  Justin Grimmer,et al.  A Bayesian Hierarchical Topic Model for Political Texts: Measuring Expressed Agendas in Senate Press Releases , 2010, Political Analysis.

[54]  Arthur Spirling,et al.  Identifying Intraparty Voting Blocs in the U.K. House of Commons , 2010 .

[55]  Andrew D. Martin,et al.  MCMCpack: Markov chain Monte Carlo in R , 2011 .

[56]  Emin Orhan Dirichlet Processes , 2012 .