Bayesian Generalized Kernel Mixed Models

We propose a fully Bayesian methodology for generalized kernel mixed models (GKMMs), which are extensions of generalized linear mixed models in the feature space induced by a reproducing kernel. We place a mixture of a point-mass distribution and Silverman's g-prior on the regression vector of a generalized kernel model (GKM). This mixture prior allows a fraction of the components of the regression vector to be zero. Thus, it serves for sparse modeling and is useful for Bayesian computation. In particular, we exploit data augmentation methodology to develop a Markov chain Monte Carlo (MCMC) algorithm in which the reversible jump method is used for model selection and a Bayesian model averaging method is used for posterior prediction. When the feature basis expansion in the reproducing kernel Hilbert space is treated as a stochastic process, this approach can be related to the Karhunen-Loeve expansion of a Gaussian process (GP). Thus, our sparse modeling framework leads to a flexible approximation method for GPs.

[1]  A. Rukhin Matrix Variate Distributions , 1999, The Multivariate Normal Distribution.

[2]  Robert Kohn,et al.  Nonparametric regression using linear combinations of basis functions , 2001, Stat. Comput..

[3]  Alexander J. Smola,et al.  Sparse Greedy Gaussian Process Regression , 2000, NIPS.

[4]  Matthias W. Seeger,et al.  Using the Nyström Method to Speed Up Kernel Machines , 2000, NIPS.

[5]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[6]  B. Silverman,et al.  Some Aspects of the Spline Smoothing Approach to Non‐Parametric Regression Curve Fitting , 1985 .

[7]  Ji Zhu,et al.  Kernel Logistic Regression and the Import Vector Machine , 2001, NIPS.

[8]  Eric R. Ziegel,et al.  Generalized Linear Models , 2002, Technometrics.

[9]  Edward Lloyd Snelson,et al.  Flexible and efficient Gaussian process models for machine learning , 2007 .

[10]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[11]  D. Madigan,et al.  Bayesian Model Averaging for Linear Regression Models , 1997 .

[12]  David Barber,et al.  Bayesian Classification With Gaussian Processes , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[13]  Neil D. Lawrence,et al.  Fast Sparse Gaussian Process Methods: The Informative Vector Machine , 2002, NIPS.

[14]  P. Green,et al.  Bayesian Variable Selection and the Swendsen-Wang Algorithm , 2004 .

[15]  Malay Ghosh,et al.  Bayesian nonlinear regression for large p small n problems , 2012, J. Multivar. Anal..

[16]  P. McCullagh,et al.  Generalized Linear Models , 1992 .

[17]  P. Diggle,et al.  Model-based geostatistics (with discussion). , 1998 .

[18]  George Eastman House,et al.  Sparse Bayesian Learning and the Relevance Vector Machine , 2001 .

[19]  T. C. Haas,et al.  Model-based geostatistics. Discussion. Authors' reply , 1998 .

[20]  P. Green Reversible jump Markov chain Monte Carlo computation and Bayesian model determination , 1995 .

[21]  Matthew West,et al.  Bayesian factor regression models in the''large p , 2003 .

[22]  N. L. Johnson,et al.  Multivariate Analysis , 1958, Nature.

[23]  Zoubin Ghahramani,et al.  Sparse Gaussian Processes using Pseudo-inputs , 2005, NIPS.

[24]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[25]  S. Mukherjee,et al.  Nonparametric Bayesian Kernel Models , 2007 .

[26]  Zhihua Zhang,et al.  Bayesian Multicategory Support Vector Machines , 2006, UAI.

[27]  G. Casella,et al.  The Bayesian Lasso , 2008 .

[28]  Stephen P. Brooks Quantitative convergence assessment for Markov chain Monte Carlo via cusums , 1998, Stat. Comput..

[29]  C. Holmes,et al.  Bayesian auxiliary variable models for binary and multinomial regression , 2006 .

[30]  A. P. Dawid,et al.  Regression and Classification Using Gaussian Process Priors , 2009 .

[31]  Tom Minka,et al.  Expectation Propagation for approximate Bayesian inference , 2001, UAI.

[32]  Sayan Mukherjee,et al.  Characterizing the Function Space for Bayesian Kernel Models , 2007, J. Mach. Learn. Res..

[33]  Marina Vannucci,et al.  Bayesian Variable Selection in Multinomial Probit Models to Identify Molecular Signatures of Disease Stage , 2004, Biometrics.

[34]  Sp Brooks Quantitative convergence assessment for MCMC via CUSUMS , 1998 .

[35]  Michael I. Jordan,et al.  Multiple kernel learning, conic duality, and the SMO algorithm , 2004, ICML.

[36]  D. Harville Maximum Likelihood Approaches to Variance Component Estimation and to Related Problems , 1977 .

[37]  David B. Dunson,et al.  NONPARAMETRIC BAYES KERNEL-BASED PRIORS FOR FUNCTIONAL DATA ANALYSIS , 2009 .

[38]  Peter Sollich,et al.  Bayesian Methods for Support Vector Machines: Evidence and Predictive Class Probabilities , 2002, Machine Learning.

[39]  Chris Hans Bayesian lasso regression , 2009 .

[40]  Edward Y. Chang,et al.  Semiparametric Regression Using Student $t$ Processes , 2007, IEEE Transactions on Neural Networks.

[41]  Zhihua Zhang,et al.  Posterior Consistency of the Silverman g-prior in Bayesian Model Choice , 2008, NIPS.

[42]  G. Wahba Spline models for observational data , 1990 .

[43]  Yi Lin Multicategory Support Vector Machines, Theory, and Application to the Classification of . . . , 2003 .

[44]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[45]  Gunnar Rätsch,et al.  Soft Margins for AdaBoost , 2001, Machine Learning.

[46]  Qing Li,et al.  The Bayesian elastic net , 2010 .

[47]  R. Parker,et al.  Discussion of Dr Silverman''s paper , 1985 .

[48]  R. Kohn,et al.  Nonparametric regression using Bayesian variable selection , 1996 .

[49]  E. George,et al.  APPROACHES FOR BAYESIAN VARIABLE SELECTION , 1997 .

[50]  Mário A. T. Figueiredo Adaptive Sparseness for Supervised Learning , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[51]  B. Mallick,et al.  Bayesian classification of tumours by using gene expression data , 2005 .

[52]  S. Chib,et al.  Bayesian analysis of binary and polychotomous response data , 1993 .

[53]  James G. Scott,et al.  The horseshoe estimator for sparse signals , 2010 .

[54]  Mark Girolami,et al.  Variational Bayesian Multinomial Probit Regression with Gaussian Process Priors , 2006, Neural Computation.

[55]  Carl E. Rasmussen,et al.  A Unifying View of Sparse Approximate Gaussian Process Regression , 2005, J. Mach. Learn. Res..

[56]  M. Clyde,et al.  Mixtures of g Priors for Bayesian Variable Selection , 2008 .