Stationary point variational Bayesian attribute-distributed sparse learning with ℓ1 sparsity constraints

The paper proposes a new variational Bayesian algorithm for ℓ1-penalized multivariate regression with attribute-distributed data. The algorithm is based on the variational Bayesian version of the SAGE algorithm that realizes a training of individual agents in a distributed fashion and sparse Bayesian learning (SBL) with hierarchical sparsity prior modeling of the agent weights. The SBL introduces constraints on the weights of individual agents, thus reducing the effects of overfitting and removing/suppressing poorly performing agents in the ensemble estimator. The ℓ1 constraint is introduced using a product of a Gaussian and an exponential probability density function with the resulting marginalized prior being a Laplace pdf. Such a hierarchical formulation of the prior allows for a computation of the stationary points of the variational update expressions for prior parameters, as well as deriving conditions that ensure convergence to these stationary points. Using synthetic data it is demonstrated that the proposed algorithm performs very well in terms of the achieved MSE, and outperforms other algorithms in the ability to sparsify non-informative agents, while at the same time allowing distributed implementation and flexible agent update protocols.

[1]  George Eastman House,et al.  Sparse Bayesian Learning and the Relevan e Ve tor Ma hine , 2001 .

[2]  H. Vincent Poor,et al.  Dimensionally distributed learning models and algorithm , 2008, 2008 11th International Conference on Information Fusion.

[3]  Mário A. T. Figueiredo Adaptive Sparseness for Supervised Learning , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[4]  H. Vincent Poor,et al.  Cooperative training for attribute-distributed data: Trade-off between data transmission and performance , 2009, 2009 12th International Conference on Information Fusion.

[5]  Sundeep Rangan,et al.  Necessary and Sufficient Conditions for Sparsity Pattern Recovery , 2008, IEEE Transactions on Information Theory.

[6]  B. Jørgensen Statistical Properties of the Generalized Inverse Gaussian Distribution , 1981 .

[7]  Sora Choi,et al.  Topics in Distributed Inference , 2013 .

[8]  Dmitriy Shutin,et al.  Sparse Variational Bayesian SAGE Algorithm With Application to the Estimation of Multipath Wireless Channels , 2011, IEEE Transactions on Signal Processing.

[9]  Nasser M. Nasrabadi,et al.  Pattern Recognition and Machine Learning , 2006, Technometrics.

[10]  Stephen P. Boyd,et al.  Enhancing Sparsity by Reweighted ℓ1 Minimization , 2007, 0711.1612.

[11]  Alfred O. Hero,et al.  Space-alternating generalized expectation-maximization algorithm , 1994, IEEE Trans. Signal Process..

[12]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[13]  H. Vincent Poor,et al.  Space-alternating attribute-distributed sparse learning , 2010, 2010 2nd International Workshop on Cognitive Information Processing.

[14]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[15]  J. Friedman,et al.  Multidimensional Additive Spline Approximation , 1983 .

[16]  H. Vincent Poor,et al.  Agent selection for regression on attribute distributed data , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.