Sparse Bayesian Learning with Diagonal Quasi-Newton Method For Large Scale Classification

Sparse Bayesian Learning (SBL) constructs an extremely sparse probabilistic model with very competitive generalization. However, SBL needs to invert a big covariance matrix with complexity O(M) (M: feature size) for updating the regularization priors, making it difficult for practical use. There are three issues in SBL: 1) Inverting the covariance matrix may obtain singular solutions in some cases, which hinders SBL from convergence; 2) Poor scalability to problems with high dimensional feature space or large data size; 3) SBL easily suffers from memory overflow for large-scale data. This paper addresses these issues with a newly proposed diagonal Quasi-Newton (DQN) method for SBL called DQN-SBL where the inversion of big covariance matrix is ignored so that the complexity and memory storage are reduced to O(M) . The DQN-SBL is thoroughly evaluated on non-linear classifiers and linear feature selection using various benchmark datasets of different sizes. Experimental results verify that DQN-SBL receives competitive generalization with a very sparse model and scales well to large-scale problems. Index term—diagonal Quasi-newton method, Sparse Bayesian Learning, large-scale problems, Sparse model

[1]  Chih-Jen Lin,et al.  Trust region Newton methods for large-scale logistic regression , 2007, ICML '07.

[2]  Yalda Mohsenzadeh,et al.  The Relevance Sample-Feature Machine: A Sparse Bayesian Learning Approach to Joint Feature-Sample Selection , 2013, IEEE Transactions on Cybernetics.

[3]  Jun Fang,et al.  Fast Inverse-Free Sparse Bayesian Learning via Relaxed Evidence Lower Bound Maximization , 2017, IEEE Signal Processing Letters.

[4]  George Cybenko,et al.  Ill-Conditioning in Neural Network Training Problems , 1993, SIAM J. Sci. Comput..

[5]  Jorge Nocedal,et al.  On the limited memory BFGS method for large scale optimization , 1989, Math. Program..

[6]  Aryan Mokhtari,et al.  A quasi-Newton method for large scale support vector machines , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[7]  Chi-Man Vong,et al.  Sparse Bayesian Extreme Learning Machine for Multi-classification , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[8]  Bhaskar D. Rao,et al.  Extension of SBL Algorithms for the Recovery of Block Sparse Signals With Intra-Block Correlation , 2012, IEEE Transactions on Signal Processing.

[9]  Bradley Worley,et al.  Scalable Mean-Field Sparse Bayesian Learning , 2019, IEEE Transactions on Signal Processing.

[10]  Chi-Man Vong,et al.  Multinomial Bayesian extreme learning machine for sparse and accurate classification model , 2021, Neurocomputing.

[11]  Bhaskar D. Rao,et al.  A GAMP-Based Low Complexity Sparse Bayesian Learning Algorithm , 2017, IEEE Transactions on Signal Processing.

[12]  Michael E. Tipping,et al.  Analysis of Sparse Bayesian Learning , 2001, NIPS.

[13]  Wei Chen,et al.  Simultaneously Sparse and Low-Rank Matrix Reconstruction via Nonconvex and Nonseparable Regularization , 2018, IEEE Transactions on Signal Processing.

[14]  Yalda Mohsenzadeh,et al.  Gaussian Kernel Width Optimization for Sparse Bayesian Learning , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[15]  Matthew J. Beal Variational algorithms for approximate Bayesian inference , 2003 .

[16]  Jieping Ye,et al.  Large-scale sparse logistic regression , 2009, KDD.

[18]  George Eastman House,et al.  Sparse Bayesian Learning and the Relevance Vector Machine , 2001 .

[19]  Huanhuan Chen,et al.  Multiclass Probabilistic Classification Vector Machine , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[20]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[21]  Thorsten Joachims,et al.  Making large-scale support vector machine learning practical , 1999 .

[22]  H. Brendan McMahan,et al.  Follow-the-Regularized-Leader and Mirror Descent: Equivalence Theorems and L1 Regularization , 2011, AISTATS.

[23]  Huanhuan Chen,et al.  Efficient Probabilistic Classification Vector Machine With Incremental Basis Function Selection , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[24]  Christopher M. Bishop,et al.  Variational Relevance Vector Machines , 2000, UAI.

[25]  Theodoros Damoulas,et al.  Multiclass Relevance Vector Machines: Sparsity and Accuracy , 2010, IEEE Transactions on Neural Networks.

[26]  D. Shanno Conditioning of Quasi-Newton Methods for Function Minimization , 1970 .