Sparse kernel learning with LASSO and Bayesian inference algorithm

Kernelized LASSO (Least Absolute Selection and Shrinkage Operator) has been investigated in two separate recent papers [Gao, J., Antolovich, M., & Kwan, P. H. (2008). L1 LASSO and its Bayesian inference. In W. Wobcke, & M. Zhang (Eds.), Lecture notes in computer science: Vol. 5360 (pp. 318-324); Wang, G., Yeung, D. Y., & Lochovsky, F. (2007). The kernel path in kernelized LASSO. In International conference on artificial intelligence and statistics (pp. 580-587). San Juan, Puerto Rico: MIT Press]. This paper is concerned with learning kernels under the LASSO formulation via adopting a generative Bayesian learning and inference approach. A new robust learning algorithm is proposed which produces a sparse kernel model with the capability of learning regularized parameters and kernel hyperparameters. A comparison with state-of-the-art methods for constructing sparse regression models such as the relevance vector machine (RVM) and the local regularization assisted orthogonal least squares regression (LROLS) is given. The new algorithm is also demonstrated to possess considerable computational advantages.

[1]  Sheng Chen,et al.  Local regularization assisted orthogonal least squares regression , 2006, Neurocomputing.

[2]  Junbin Gao,et al.  Adapting Kernels by Variational Approach in SVM , 2002, Australian Joint Conference on Artificial Intelligence.

[3]  Patrick Brézillon,et al.  Lecture Notes in Artificial Intelligence , 1999 .

[4]  Junbin Gao,et al.  Mixture of the Robust L1 Distributions and Its Applications , 2007, Australian Conference on Artificial Intelligence.

[5]  Daming Shi,et al.  Significant vector learning to construct sparse kernel regression models , 2007, Neural Networks.

[6]  Mário A. T. Figueiredo,et al.  Gradient Projection for Sparse Reconstruction: Application to Compressed Sensing and Other Inverse Problems , 2007, IEEE Journal of Selected Topics in Signal Processing.

[7]  J. Friedman Multivariate adaptive regression splines , 1990 .

[8]  Nicolai Meinshausen,et al.  Relaxed Lasso , 2007, Comput. Stat. Data Anal..

[9]  Massimiliano Pontil,et al.  On the Noise Model of Support Vector Machines Regression , 2000, ALT.

[10]  Sheng Chen,et al.  An orthogonal forward regression technique for sparse kernel density estimation , 2008, Neurocomputing.

[11]  David J. C. MacKay,et al.  Bayesian Interpolation , 1992, Neural Computation.

[12]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[13]  Laurie A. Hulbert,et al.  A direct active set algorithm for large sparse quadratic programs with simple bounds , 1989, Math. Program..

[14]  R. Harrison,et al.  Support Vector Machines for System Identification , 1998 .

[15]  K MallickBani,et al.  Gene selection using a two-level hierarchical Bayesian model , 2004 .

[16]  Volker Roth,et al.  The generalized LASSO , 2004, IEEE Transactions on Neural Networks.

[17]  Jerome H. Friedman Multivariate adaptive regression splines (with discussion) , 1991 .

[18]  G. Casella,et al.  The Bayesian Lasso , 2008 .

[19]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[20]  M. R. Osborne,et al.  On the LASSO and its Dual , 2000 .

[21]  Mark W. Schmidt,et al.  Fast Optimization Methods for L1 Regularization: A Comparative Study and Two New Approaches , 2007, ECML.

[22]  Nan Zhang,et al.  A gradient descending solution to the LASSO criteria , 2005, Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005..

[23]  George Eastman House,et al.  Sparse Bayesian Learning and the Relevance Vector Machine , 2001 .

[24]  R. Tibshirani,et al.  Least angle regression , 2004, math/0406456.

[25]  I. Daubechies,et al.  An iterative thresholding algorithm for linear inverse problems with a sparsity constraint , 2003, math/0307152.

[26]  Junbin Gao,et al.  Robust L1 Principal Component Analysis and Its Bayesian Variational Inference , 2008, Neural Computation.

[27]  Wenjiang J. Fu Penalized Regressions: The Bridge versus the Lasso , 1998 .

[28]  Gang Wang,et al.  The Kernel Path in Kernelized LASSO , 2007, AISTATS.

[29]  James Theiler,et al.  Grafting: Fast, Incremental Feature Selection by Gradient Descent in Function Space , 2003, J. Mach. Learn. Res..

[30]  Junbin Gao,et al.  On a Class of Support Vector Kernels Based on Frames in Function Hilbert Spaces , 2001, Neural Computation.

[31]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[32]  Bani K. Mallick,et al.  Gene selection using a two-level hierarchical Bayesian model , 2004, Bioinform..

[33]  Junbin Gao,et al.  L1 LASSO Modeling and Its Bayesian Inference , 2008, Australasian Conference on Artificial Intelligence.

[34]  S. A. Billings,et al.  The identification of linear and non-linear models of a turbocharged automotive diesel engine , 1989 .

[35]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[36]  Hugo Hidalgo,et al.  Application of the kernel method to the inverse geosounding problem , 2003, Neural Networks.

[37]  Ron Shamir,et al.  Accurate identification of alternatively spliced exons using support vector machine , 2005, Bioinform..