On the MSE Properties of Empirical Bayes Methods for Sparse Estimation

Abstract Popular convex approaches for sparse estimation such as Lasso and Multiple Kernel Learning (MKL) can be derived in a Bayesian setting, starting from a particular stochastic model. In problems where groups of variables have to be estimated, we show that the same probabilistic model, under a suitable marginalization, leads to a different non-convex estimator where hyperparameters are optimized. Theoretical arguments, independent of the correctness of the priors entering the sparse model, are included to clarify the advantages of our non-convex technique in comparison with MKL and the group version of Lasso under assumption of orthogonal regressors.

[1]  B. Efron,et al.  Stein's Estimation Rule and Its Competitors- An Empirical Bayes Approach , 1973 .

[2]  C. Stein Estimation of the Mean of a Multivariate Normal Distribution , 1981 .

[3]  J. Berger Statistical Decision Theory and Bayesian Analysis , 1988 .

[4]  G. Wahba Spline models for observational data , 1990 .

[5]  C. Stein,et al.  Estimation with Quadratic Loss , 1992 .

[6]  David J. C. MacKay,et al.  BAYESIAN NON-LINEAR MODELING FOR THE PREDICTION COMPETITION , 1996 .

[7]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[8]  George Eastman House,et al.  Sparse Bayesian Learning and the Relevance Vector Machine , 2001 .

[9]  Michael I. Jordan,et al.  Multiple kernel learning, conic duality, and the SMO algorithm , 2004, ICML.

[10]  R. Tibshirani,et al.  Least angle regression , 2004, math/0406456.

[11]  Charles A. Micchelli,et al.  Learning Multiple Tasks with Kernel Methods , 2005, J. Mach. Learn. Res..

[12]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .

[13]  Peng Zhao,et al.  On Model Selection Consistency of Lasso , 2006, J. Mach. Learn. Res..

[14]  Olivier Bernard,et al.  Near optimal interval observers bundle for uncertain bioreactors , 2007, 2007 European Control Conference (ECC).

[15]  Terence Tao,et al.  The Dantzig selector: Statistical estimation when P is much larger than n , 2005, math/0506081.

[16]  David P. Wipf,et al.  A New View of Automatic Relevance Determination , 2007, NIPS.

[17]  Bhaskar D. Rao,et al.  An Empirical Bayesian Strategy for Solving the Simultaneous Sparse Approximation Problem , 2007, IEEE Transactions on Signal Processing.

[18]  Francis R. Bach,et al.  Consistency of the group Lasso and multiple kernel learning , 2007, J. Mach. Learn. Res..

[19]  G. Casella,et al.  The Bayesian Lasso , 2008 .

[20]  강승규,et al.  Empirical Bayes Method를 이용한 교통사고 예측모형 , 2009 .

[21]  Alessandro Chiuso,et al.  Learning sparse dynamic linear systems using stable spline kernels and exponential hyperpriors , 2010, NIPS.

[22]  Giuseppe De Nicolao,et al.  Bayesian Online Multitask Learning of Gaussian Processes , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Giuseppe De Nicolao,et al.  A new kernel-based approach for linear system identification , 2010, Autom..

[24]  Alessandro Chiuso,et al.  Nonparametric sparse estimators for identification of large scale linear systems , 2010, 49th IEEE Conference on Decision and Control (CDC).

[25]  Francesco Dinuzzo,et al.  Kernel machines with two layers and multiple kernel learning , 2010, ArXiv.

[26]  L. Ljung,et al.  On the Estimation of Transfer Functions, Regularizations and Gaussian Processes – Revisited , 2011 .

[27]  Alessandro Chiuso,et al.  Prediction error identification of linear systems: A nonparametric Gaussian regression approach , 2011, Autom..

[28]  Alessandro Chiuso,et al.  Convex vs nonconvex approaches for sparse estimation: GLasso, Multiple Kernel Learning and Hyperparameter GLasso , 2013, 1302.6434.

[29]  Bhaskar D. Rao,et al.  Latent Variable Bayesian Models for Promoting Sparsity , 2011, IEEE Transactions on Information Theory.

[30]  Henrik Ohlsson,et al.  On the estimation of transfer functions, regularizations and Gaussian processes - Revisited , 2012, Autom..

[31]  Alessandro Chiuso,et al.  A Bayesian approach to sparse dynamic network identification , 2012, Autom..

[32]  R. Tibshirani,et al.  Generalized Additive Models , 1986 .