A Collaborative Training Algorithm for Distributed Learning

In this paper, an algorithm is developed for collaboratively training networks of kernel-linear least-squares regression estimators. The algorithm is shown to distributively solve a relaxation of the classical centralized least-squares regression problem. A statistical analysis shows that the generalization error afforded agents by the collaborative training algorithm can be bounded in terms of the relationship between the network topology and the representational capacity of the relevant reproducing kernel Hilbert space. Numerical experiments suggest that the algorithm is effective at reducing noise. The algorithm is relevant to the problem of distributed learning in wireless sensor networks by virtue of its exploitation of local communication. Several new questions for statistical learning theory are proposed.

[1]  Robert Nowak,et al.  Distributed optimization in sensor networks , 2004, Third International Symposium on Information Processing in Sensor Networks, 2004. IPSN 2004.

[2]  Sanjeev R. Kulkarni,et al.  A deterministic approach to throughput scaling in wireless networks , 2002, IEEE Transactions on Information Theory.

[3]  Edward J. Coyle,et al.  An energy efficient hierarchical clustering algorithm for wireless sensor networks , 2003, IEEE INFOCOM 2003. Twenty-second Annual Joint Conference of the IEEE Computer and Communications Societies (IEEE Cat. No.03CH37428).

[4]  Brendan J. Frey,et al.  Factor graphs and the sum-product algorithm , 2001, IEEE Trans. Inf. Theory.

[5]  H.-A. Loeliger,et al.  An introduction to factor graphs , 2004, IEEE Signal Process. Mag..

[6]  G. Wahba Spline models for observational data , 1990 .

[7]  N. Aronszajn Theory of Reproducing Kernels. , 1950 .

[8]  M.G. Rabbat,et al.  Generalized consensus computation in networked systems with erasure links , 2005, IEEE 6th Workshop on Signal Processing Advances in Wireless Communications, 2005..

[9]  G.B. Giannakis,et al.  Localization via ultra-wideband radios: a look at positioning aspects for future sensor networks , 2005, IEEE Signal Processing Magazine.

[10]  R.L. Moses,et al.  Locating the nodes: cooperative localization in wireless sensor networks , 2005, IEEE Signal Processing Magazine.

[11]  Robert H. Halstead,et al.  Matrix Computations , 2011, Encyclopedia of Parallel Computing.

[12]  Heinz H. Bauschke,et al.  On Projection Algorithms for Solving Convex Feasibility Problems , 1996, SIAM Rev..

[13]  H. Vincent Poor,et al.  Regression in sensor networks: training distributively with alternating projections , 2005, SPIE Optics + Photonics.

[14]  Robert D. Nowak,et al.  Distributed EM algorithms for density estimation and clustering in sensor networks , 2003, IEEE Trans. Signal Process..

[15]  G. Wahba,et al.  Some results on Tchebycheffian spline functions , 1971 .

[16]  Andreas F. Molisch,et al.  Localization via Ultra- Wideband Radios , 2005 .

[17]  L. Bregman The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming , 1967 .

[18]  Adam Krzyzak,et al.  A Distribution-Free Theory of Nonparametric Regression , 2002, Springer series in statistics.

[19]  Robert D. Nowak,et al.  Quantized incremental algorithms for distributed optimization , 2005, IEEE Journal on Selected Areas in Communications.

[20]  L. Williams,et al.  Contents , 2020, Ophthalmology (Rochester, Minn.).

[21]  Michael I. Jordan,et al.  Nonparametric decentralized detection using kernel methods , 2005, IEEE Transactions on Signal Processing.

[22]  Martin J. Wainwright,et al.  Distributed fusion in sensor networks: a graphical models perspective , 2006 .

[23]  Mung Chiang,et al.  The value of clustering in distributed estimation for sensor networks , 2005, 2005 International Conference on Wireless Networks, Communications and Mobile Computing.

[24]  C. Guestrin,et al.  Distributed regression: an efficient framework for modeling sensor network data , 2004, Third International Symposium on Information Processing in Sensor Networks, 2004. IPSN 2004.

[25]  Dimitri P. Bertsekas,et al.  Incremental Subgradient Methods for Nondifferentiable Optimization , 2001, SIAM J. Optim..

[26]  V. Delouille,et al.  Robust distributed estimation in sensor networks using the embedded polygons algorithm , 2004, Third International Symposium on Information Processing in Sensor Networks, 2004. IPSN 2004.

[27]  André Elisseeff,et al.  Stability and Generalization , 2002, J. Mach. Learn. Res..

[28]  Andrzej Stachurski,et al.  Parallel Optimization: Theory, Algorithms and Applications , 2000, Parallel Distributed Comput. Pract..

[29]  Panganamala Ramana Kumar,et al.  Extended message passing algorithm for inference in loopy Gaussian graphical models , 2004, Ad Hoc Networks.

[30]  Cynthia Rudin,et al.  Stability Analysis for Regularized Least Squares Regression , 2005, ArXiv.

[31]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[32]  H. Vincent Poor,et al.  An Introduction to Signal Detection and Estimation , 1994, Springer Texts in Electrical Engineering.

[33]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[34]  H. Vincent Poor,et al.  An introduction to signal detection and estimation (2nd ed.) , 1994 .

[35]  Balázs Kégl,et al.  Privacy-preserving boosting , 2007, Data Mining and Knowledge Discovery.

[36]  Panganamala Ramana Kumar,et al.  RHEINISCH-WESTFÄLISCHE TECHNISCHE HOCHSCHULE AACHEN , 2001 .

[37]  Bruno Sinopoli,et al.  A kernel-based learning approach to ad hoc sensor network localization , 2005, TOSN.

[38]  Robert J. McEliece,et al.  The generalized distributive law , 2000, IEEE Trans. Inf. Theory.

[39]  D. Bertsekas,et al.  Convergen e Rate of In remental Subgradient Algorithms , 2000 .

[40]  Manfred K. Warmuth,et al.  Exponentiated Gradient Versus Gradient Descent for Linear Predictors , 1997, Inf. Comput..

[41]  Carlos Guestrin,et al.  A robust architecture for distributed inference in sensor networks , 2005, IPSN 2005. Fourth International Symposium on Information Processing in Sensor Networks, 2005..

[42]  László Györfi,et al.  A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.

[43]  Mark A. Paskin,et al.  Junction tree algorithms for solving sparse linear systems , 2003 .

[44]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[45]  Manfred K. Warmuth,et al.  Additive versus exponentiated gradient updates for linear prediction , 1995, STOC '95.

[46]  John N. Tsitsiklis,et al.  Parallel and distributed computation , 1989 .

[47]  Benjamin Van Roy,et al.  Distributed Optimization in Adaptive Networks , 2003, NIPS.

[48]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[49]  Urbashi Mitra,et al.  Boundary Estimation in Sensor Networks: Theory and Methods , 2003, IPSN.

[50]  H. Vincent Poor,et al.  Consistency in models for distributed learning under communication constraints , 2005, IEEE Transactions on Information Theory.

[51]  H. Vincent Poor,et al.  Distributed Kernel Regression: An Algorithm for Training Collaboratively , 2006, 2006 IEEE Information Theory Workshop - ITW '06 Punta del Este.

[52]  Zoran Obradovic,et al.  The distributed boosting algorithm , 2001, KDD '01.

[53]  Thomas Kailath,et al.  RKHS approach to detection and estimation problems-I: Deterministic signals in Gaussian noise , 1971, IEEE Trans. Inf. Theory.