Consistency in models for distributed learning under communication constraints

Motivated by sensor networks and other distributed settings, several models for distributed learning are presented. The models differ from classical works in statistical pattern recognition by allocating observations of an independent and identically distributed (i.i.d.) sampling process among members of a network of simple learning agents. The agents are limited in their ability to communicate to a central fusion center and thus, the amount of information available for use in classification or regression is constrained. For several basic communication models in both the binary classification and regression frameworks, we question the existence of agent decision rules and fusion rules that result in a universally consistent ensemble; the answers to this question present new issues to consider with regard to universal consistency. This paper addresses the issue of whether or not the guarantees provided by Stone's theorem in centralized environments hold in distributed settings.

[1]  Sanjeev R. Kulkarni,et al.  Data-dependent kn-NN and kernel estimators consistent for arbitrary processes , 2002, IEEE Trans. Inf. Theory.

[2]  S. Yakowitz Nonparametric density and regression estimation for Markov sequences without mixing assumptions , 1989 .

[3]  A. Nobel Limits to classification and regression estimation from ergodic processes , 1999 .

[4]  Martin J. Wainwright,et al.  Decentralized detection and classification using kernel methods , 2004, ICML.

[5]  Feng Zhao,et al.  Collaborative signal and information processing in microsensor networks , 2002, IEEE Signal Processing Magazine.

[6]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[7]  Pramod K. Varshney,et al.  Distributed Detection and Data Fusion , 1996 .

[8]  Yoram Singer,et al.  Using and combining predictors that specialize , 1997, STOC '97.

[9]  Miroslaw Pawlak,et al.  Necessary and sufficient conditions for Bayes risk consistency of a recursive kernel classification rule , 1987, IEEE Trans. Inf. Theory.

[10]  László Györfi,et al.  A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.

[11]  Slobodan N. Simic,et al.  A Learning-Theory Approach to Sensor Networks , 2003, IEEE Pervasive Comput..

[12]  G. Roussas Nonparametric estimation in Markov processes , 1969 .

[13]  R. A. Silverman,et al.  Introductory Real Analysis , 1972 .

[14]  V. Ramachandran,et al.  Distributed multitarget classification in wireless sensor networks , 2005, IEEE Journal on Selected Areas in Communications.

[15]  Rick S. Blum,et al.  Distributed detection with multiple sensors I. Advanced topics , 1997, Proc. IEEE.

[16]  Sanjeev R. Kulkarni,et al.  Regression estimation from an individual stable sequence , 2007, ArXiv.

[17]  Ian F. Akyildiz,et al.  Sensor Networks , 2002, Encyclopedia of GIS.

[18]  Andrew B. Nobel,et al.  Estimating a function from ergodic samples with additive noise , 2001, IEEE Trans. Inf. Theory.

[19]  H. Sebastian Seung,et al.  Learning from a Population of Hypotheses , 1993, COLT '93.

[20]  Emad K. Al-Hussaini,et al.  Decentralized CFAR signal detection , 1995, Signal Process..

[21]  H. Vincent Poor,et al.  Distributed learning in wireless sensor networks , 2005, IEEE Signal Processing Magazine.

[22]  Adam Krzyzak,et al.  A Distribution-Free Theory of Nonparametric Regression , 2002, Springer series in statistics.

[23]  Sanjeev R. Kulkarni,et al.  Rates of convergence of nearest neighbor estimation under arbitrary sampling , 1995, IEEE Trans. Inf. Theory.

[24]  Akbar M. Sayeed,et al.  Distributed Multi-target Classification in Wireless Sensor Networks , 2003 .

[25]  Zoran Obradovic,et al.  The distributed boosting algorithm , 2001, KDD '01.

[26]  Gábor Lugosi,et al.  Learning with an unreliable teacher , 1992, Pattern Recognit..

[27]  Akbar M. Sayeed,et al.  Collaborative Signal Processing for Distributed Classification in Sensor Networks , 2003, IPSN.

[28]  Brian M. Sadler,et al.  Information retrieval and processing in sensor networks: deterministic scheduling vs. random access , 2004, International Symposium onInformation Theory, 2004. ISIT 2004. Proceedings..

[29]  H. Vincent Poor,et al.  Consistency in a model for distributed learning with specialists , 2004, International Symposium onInformation Theory, 2004. ISIT 2004. Proceedings..

[30]  Adam Krzyzak,et al.  The rates of convergence of kernel regression estimates and classification rules , 1986, IEEE Trans. Inf. Theory.

[31]  H. Vincent Poor,et al.  Consistency in Models for Communication Constrained Distributed Learning , 2004, COLT.

[32]  C. J. Stone,et al.  Consistent Nonparametric Regression , 1977 .

[33]  S. Yakowitz Nearest neighbor regression estimation for null-recurrent Markov time series , 1993 .

[34]  Yu Hen Hu,et al.  Detection, classification, and tracking of targets , 2002, IEEE Signal Process. Mag..

[35]  J. Tsitsiklis Decentralized Detection' , 1993 .

[36]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[37]  Sanjeev R. Kulkarni,et al.  Density Estimation from an Individual Numerical Sequence , 1998, IEEE Trans. Inf. Theory.

[38]  Venugopal V. Veeravalli Decentralized quickest change detection , 2001, IEEE Trans. Inf. Theory.

[39]  Geoffrey E. Hinton,et al.  Adaptive Mixtures of Local Experts , 1991, Neural Computation.

[40]  Jiri Matas,et al.  On Combining Classifiers , 1998, IEEE Trans. Pattern Anal. Mach. Intell..