Regression Density Estimation With Variational Methods and Stochastic Approximation

Regression density estimation is the problem of flexibly estimating a response distribution as a function of covariates. An important approach to regression density estimation uses finite mixture models and our article considers flexible mixtures of heteroscedastic regression (MHR) models where the response distribution is a normal mixture, with the component means, variances, and mixture weights all varying as a function of covariates. Our article develops fast variational approximation (VA) methods for inference. Our motivation is that alternative computationally intensive Markov chain Monte Carlo (MCMC) methods for fitting mixture models are difficult to apply when it is desired to fit models repeatedly in exploratory analysis and model choice. Our article makes three contributions. First, a VA for MHR models is described where the variational lower bound is in closed form. Second, the basic approximation can be improved by using stochastic approximation (SA) methods to perturb the initial solution to attain higher accuracy. Third, the advantages of our approach for model choice and evaluation compared with MCMC-based approaches are illustrated. These advantages are particularly compelling for time series data where repeated refitting for one-step-ahead prediction in model choice and diagnostics and in rolling-window computations is very common. Supplementary materials for the article are available online.

[1]  Anthony N. Pettitt,et al.  A new variational Bayesian algorithm with application to human mobility pattern modeling , 2012, Stat. Comput..

[2]  Andrey Pepelyshev,et al.  The Role of the Nugget Term in the Gaussian Process Method , 2010, 1005.4385.

[3]  M. Wand,et al.  Explaining Variational Approximations , 2010 .

[4]  Jon D. McAuliffe,et al.  Variational Inference for Large-Scale Models of Discrete Choice , 2007, 0712.2526.

[5]  Sveriges Riksbank Modeling conditional densities using finite smooth mixtures , 2010 .

[6]  M. West,et al.  Bounded Approximations for Marginal Likelihoods , 2010 .

[7]  Robert Kohn,et al.  Flexible Modeling of Conditional Distributions Using Smooth Mixtures of Asymmetric Student T Densities , 2009 .

[8]  F. Hall,et al.  Approximation of conditional densities by smooth mixtures of regressions ∗ , 2009 .

[9]  Christophe Andrieu,et al.  A tutorial on adaptive MCMC , 2008, Stat. Comput..

[10]  J. Geweke,et al.  Comparing and Evaluating Bayesian Predictive Distributions of Asset Returns , 2008 .

[11]  R. Kohn,et al.  Regression Density Estimation Using Smooth Adaptive Gaussian Mixtures , 2007 .

[12]  Robert Kohn,et al.  Locally Adaptive Nonparametric Binary Regression , 2007, 0709.3545.

[13]  D. M. Titterington,et al.  Variational approximations in Bayesian model selection for finite mixture distributions , 2007, Comput. Stat. Data Anal..

[14]  James C. Spall,et al.  Introduction to Stochastic Search and Optimization. Estimation, Simulation, and Control (Spall, J.C. , 2007 .

[15]  J. Geweke,et al.  Smoothly mixing regressions , 2007 .

[16]  Aristidis Likas,et al.  Unsupervised Learning of Gaussian Mixtures Based on Variational Component Splitting , 2007, IEEE Transactions on Neural Networks.

[17]  N. Pillai,et al.  Bayesian density regression , 2007 .

[18]  Nasser M. Nasrabadi,et al.  Pattern Recognition and Machine Learning , 2006, Technometrics.

[19]  H. Robbins A Stochastic Approximation Method , 1951 .

[20]  A. OHagan,et al.  Bayesian analysis of computer code outputs: A tutorial , 2006, Reliab. Eng. Syst. Saf..

[21]  Michael I. Jordan,et al.  Variational inference for Dirichlet process mixtures , 2006 .

[22]  J. E. Griffin,et al.  Order-Based Dependent Dirichlet Processes , 2006 .

[23]  Bo Wang,et al.  Inadequacy of interval estimates corresponding to variational Bayesian approximations , 2005, AISTATS.

[24]  Walter Boughton,et al.  The Australian water balance model , 2004, Environ. Model. Softw..

[25]  Tim Hesterberg,et al.  Introduction to Stochastic Search and Optimization: Estimation, Simulation, and Control , 2004, Technometrics.

[26]  S. Frühwirth-Schnatter Estimating Marginal Likelihoods for Mixture and Markov Switching Models Using Bridge Sampling Techniques , 2004 .

[27]  S. MacEachern,et al.  An ANOVA Model for Dependent Random Measures , 2004 .

[28]  Michael I. Jordan,et al.  An Introduction to Variational Methods for Graphical Models , 1999, Machine Learning.

[29]  Anthony O'Hagan,et al.  Kendall's Advanced Theory of Statistics, volume 2B: Bayesian Inference, second edition , 2004 .

[30]  James C. Spall,et al.  Introduction to stochastic search and optimization - estimation, simulation, and control , 2003, Wiley-Interscience series in discrete mathematics and optimization.

[31]  Christophe Biernacki,et al.  Choosing starting values for the EM algorithm for getting the highest likelihood in multivariate Gaussian mixture models , 2003, Comput. Stat. Data Anal..

[32]  Christopher M. Bishop,et al.  Bayesian Hierarchical Mixtures of Experts , 2002, UAI.

[33]  Antti Honkela,et al.  On-line Variational Bayesian Learning , 2003 .

[34]  Naonori Ueda,et al.  Bayesian model search for mixture models based on optimizing variational bounds , 2002, Neural Networks.

[35]  A. Timmermann,et al.  Market timing and return prediction under model instability , 2002 .

[36]  Bradley P. Carlin,et al.  Bayesian measures of model complexity and fit , 2002 .

[37]  Jouko Lampinen,et al.  Bayesian Model Assessment and Comparison Using Cross-Validation Predictive Densities , 2002, Neural Computation.

[38]  Michel Wedel,et al.  Concomitant variables in finite mixture models , 2002 .

[39]  Sally Wood,et al.  Bayesian mixture of splines for spatially adaptive nonparametric regression , 2002 .

[40]  Matt P. Wand,et al.  Vector Differential Calculus in Statistics , 2002 .

[41]  Aki Vehtari Discussion to "Bayesian measures of model complexity and fit" by Spiegelhalter, D.J., Best, N.G., Carlin, B.P., and van der Linde, A. , 2002 .

[42]  Nando de Freitas,et al.  Variational MCMC , 2001, UAI.

[43]  S. Chib,et al.  Marginal Likelihood From the Metropolis–Hastings Output , 2001 .

[44]  Radford M. Neal Annealed importance sampling , 1998, Stat. Comput..

[45]  Adrian Corduneanu,et al.  Variational Bayesian Model Selection for Mixture Distributions , 2001 .

[46]  Zoubin Ghahramani,et al.  Variational Inference for Bayesian Mixtures of Factor Analysers , 1999, NIPS.

[47]  M. Tanner,et al.  Hierarchical mixtures-of-experts for exponential family regression models: approximation and maximum , 1999 .

[48]  M. Wedel Concomitant variables in mixture models , 1999 .

[49]  Michael I. Jordan Learning in Graphical Models , 1999, NATO ASI Series.

[50]  Fengchun Peng,et al.  Bayesian Inference in Mixtures-of-Experts and Hierarchical Mixtures-of-Experts Models With an Applic , 1996 .

[51]  Steve R. Waterhouse,et al.  Bayesian Methods for Mixtures of Experts , 1995, NIPS.

[52]  Robert A. Jacobs,et al.  Hierarchical Mixtures of Experts and the EM Algorithm , 1993, Neural Computation.

[53]  J. H. Schuenemeyer,et al.  Generalized Linear Models (2nd ed.) , 1992 .

[54]  Geoffrey E. Hinton,et al.  Adaptive Mixtures of Local Experts , 1991, Neural Computation.

[55]  Gordon K. Smyth,et al.  Generalized linear models with varying dispersion , 1989 .

[56]  H. Kesten Accelerated Stochastic Approximation , 1958 .