Space-alternating attribute-distributed sparse learning

The paper proposes a new variational Bayesian algorithm for multivariate regression with attribute-distributed or dimensionally distributed data. Compared to the existing approaches the proposed algorithm exploits the variational version of the Space-Alternating Generalized Expectation-Maximization (SAGE) algorithm that by means of admissible hidden data - an analog of the complete data in the EM framework - allows parameters of a single agent to be updated assuming that parameters of the other agents are fixed. This allows learning to be implemented in a distributed fashion by sequentially updating the agents one after another. Inspired by Bayesian sparsity techniques, the algorithm also introduces constraints on the agent parameters via parametric priors. This adds a mechanism for pruning irrelevant agents, as well as for minimizing the effect of overfitting. Using synthetic data, as well as measurement data from the UCI Machine Learning Repository it is demonstrated that the proposed algorithm outperforms existing solutions both in the achieved mean-square error (MSE), as well as in convergence speed due to the ability to sparsify noninformative agents, while at the same time allowing distributed implementation and flexible agent update protocols.

[1]  J. Friedman,et al.  Multidimensional Additive Spline Approximation , 1983 .

[2]  Michael E. Tipping,et al.  Fast Marginal Likelihood Maximisation for Sparse Bayesian Models , 2003 .

[3]  David B. Skillicorn,et al.  Building predictors from vertically distributed data , 2004, CASCON.

[4]  H. Vincent Poor,et al.  Agent selection for regression on attribute distributed data , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[5]  Alfred O. Hero,et al.  Space-alternating generalized expectation-maximization algorithm , 1994, IEEE Trans. Signal Process..

[6]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[7]  H. Vincent Poor,et al.  Cooperative training for attribute-distributed data: Trade-off between data transmission and performance , 2009, 2009 12th International Conference on Information Fusion.

[8]  Philip K. Chan,et al.  Advances in Distributed and Parallel Knowledge Discovery , 2000 .

[9]  Nasser M. Nasrabadi,et al.  Pattern Recognition and Machine Learning , 2006, Technometrics.

[10]  Ali H. Sayed,et al.  Distributed Adaptive Incremental Strategies: Formulation and Performance Analysis , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[11]  Daryl E. Hershberger,et al.  Collective Data Mining: a New Perspective toward Distributed Data Mining Advances in Distributed Data Mining Book , 1999 .

[12]  George Eastman House,et al.  Sparse Bayesian Learning and the Relevance Vector Machine , 2001 .

[13]  Joydeep Ghosh,et al.  A distributed learning framework for heterogeneous data sources , 2005, KDD '05.

[14]  Gernot Kubin,et al.  Application of the Evidence Procedure to the Estimation of Wireless Channels , 2007, EURASIP J. Adv. Signal Process..

[15]  D.G. Tzikas,et al.  The variational approximation for Bayesian inference , 2008, IEEE Signal Processing Magazine.

[16]  Klaus I. Pedersen,et al.  Channel parameter estimation in mobile radio environments using the SAGE algorithm , 1999, IEEE J. Sel. Areas Commun..

[17]  H. Vincent Poor,et al.  Dimensionally distributed learning models and algorithm , 2008, 2008 11th International Conference on Information Fusion.

[18]  Salvatore J. Stolfo,et al.  Toward parallel and distributed learning by meta-learning , 1993 .

[19]  Hillol Kargupta,et al.  Distributed Multivariate Regression Using Wavelet-Based Collective Data Mining , 2001, J. Parallel Distributed Comput..

[20]  Matthew J. Beal Variational algorithms for approximate Bayesian inference , 2003 .