论文信息 - Sparse Bayesian Learning with Diagonal Quasi-Newton Method For Large Scale Classification

Sparse Bayesian Learning with Diagonal Quasi-Newton Method For Large Scale Classification

Sparse Bayesian Learning (SBL) constructs an extremely sparse probabilistic model with very competitive generalization. However, SBL needs to invert a big covariance matrix with complexity O(M) (M: feature size) for updating the regularization priors, making it difficult for practical use. There are three issues in SBL: 1) Inverting the covariance matrix may obtain singular solutions in some cases, which hinders SBL from convergence; 2) Poor scalability to problems with high dimensional feature space or large data size; 3) SBL easily suffers from memory overflow for large-scale data. This paper addresses these issues with a newly proposed diagonal Quasi-Newton (DQN) method for SBL called DQN-SBL where the inversion of big covariance matrix is ignored so that the complexity and memory storage are reduced to O(M) . The DQN-SBL is thoroughly evaluated on non-linear classifiers and linear feature selection using various benchmark datasets of different sizes. Experimental results verify that DQN-SBL receives competitive generalization with a very sparse model and scales well to large-scale problems. Index term—diagonal Quasi-newton method, Sparse Bayesian Learning, large-scale problems, Sparse model

Jie Du | Chi-Man Vong | Jiahua Luo

[1] Chih-Jen Lin,et al. Trust region Newton methods for large-scale logistic regression , 2007, ICML '07.

[2] Yalda Mohsenzadeh,et al. The Relevance Sample-Feature Machine: A Sparse Bayesian Learning Approach to Joint Feature-Sample Selection , 2013, IEEE Transactions on Cybernetics.

[3] Jun Fang,et al. Fast Inverse-Free Sparse Bayesian Learning via Relaxed Evidence Lower Bound Maximization , 2017, IEEE Signal Processing Letters.

[4] George Cybenko,et al. Ill-Conditioning in Neural Network Training Problems , 1993, SIAM J. Sci. Comput..

[5] Jorge Nocedal,et al. On the limited memory BFGS method for large scale optimization , 1989, Math. Program..

[6] Aryan Mokhtari,et al. A quasi-Newton method for large scale support vector machines , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[7] Chi-Man Vong,et al. Sparse Bayesian Extreme Learning Machine for Multi-classification , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[8] Bhaskar D. Rao,et al. Extension of SBL Algorithms for the Recovery of Block Sparse Signals With Intra-Block Correlation , 2012, IEEE Transactions on Signal Processing.

[9] Bradley Worley,et al. Scalable Mean-Field Sparse Bayesian Learning , 2019, IEEE Transactions on Signal Processing.

[10] Chi-Man Vong,et al. Multinomial Bayesian extreme learning machine for sparse and accurate classification model , 2021, Neurocomputing.

[11] Bhaskar D. Rao,et al. A GAMP-Based Low Complexity Sparse Bayesian Learning Algorithm , 2017, IEEE Transactions on Signal Processing.

[12] Michael E. Tipping,et al. Analysis of Sparse Bayesian Learning , 2001, NIPS.

[13] Wei Chen,et al. Simultaneously Sparse and Low-Rank Matrix Reconstruction via Nonconvex and Nonseparable Regularization , 2018, IEEE Transactions on Signal Processing.

[14] Yalda Mohsenzadeh,et al. Gaussian Kernel Width Optimization for Sparse Bayesian Learning , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[15] Matthew J. Beal. Variational algorithms for approximate Bayesian inference , 2003 .

[16] Jieping Ye,et al. Large-scale sparse logistic regression , 2009, KDD.

[18] George Eastman House,et al. Sparse Bayesian Learning and the Relevance Vector Machine , 2001 .

[19] Huanhuan Chen,et al. Multiclass Probabilistic Classification Vector Machine , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[20] Corinna Cortes,et al. Support-Vector Networks , 1995, Machine Learning.

[21] Thorsten Joachims,et al. Making large-scale support vector machine learning practical , 1999 .

[22] H. Brendan McMahan,et al. Follow-the-Regularized-Leader and Mirror Descent: Equivalence Theorems and L1 Regularization , 2011, AISTATS.

[23] Huanhuan Chen,et al. Efficient Probabilistic Classification Vector Machine With Incremental Basis Function Selection , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[24] Christopher M. Bishop,et al. Variational Relevance Vector Machines , 2000, UAI.

[25] Theodoros Damoulas,et al. Multiclass Relevance Vector Machines: Sparsity and Accuracy , 2010, IEEE Transactions on Neural Networks.

[26] D. Shanno. Conditioning of Quasi-Newton Methods for Function Minimization , 1970 .