Generalization and Robustness of Batched Weighted Average Algorithm with V-Geometrically Ergodic Markov Data

We analyze the generalization and robustness of the batched weighted average algorithm for V-geometrically ergodic Markov data. This algorithm is a good alternative to the empirical risk minimization algorithm when the latter suffers from overfitting or when optimizing the empirical risk is hard. For the generalization of the algorithm, we prove a PAC-style bound on the training sample size for the expected L 1-loss to converge to the optimal loss when training data are V-geometrically ergodic Markov chains. For the robustness, we show that if the training target variable’s values contain bounded noise, then the generalization bound of the algorithm deviates at most by the range of the noise. Our results can be applied to the regression problem, the classification problem, and the case where there exists an unknown deterministic target hypothesis.

[1]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[2]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, CACM.

[3]  Richard L. Tweedie,et al.  Markov Chains and Stochastic Stability , 1993, Communications and Control Engineering Series.

[4]  Mehryar Mohri,et al.  Rademacher Complexity Bounds for Non-I.I.D. Processes , 2008, NIPS.

[5]  Bin Yu RATES OF CONVERGENCE FOR EMPIRICAL PROCESSES OF STATIONARY MIXING SEQUENCES , 1994 .

[6]  Luoqing Li,et al.  The performance bounds of learning machines based on exponentially strongly mixing sequences , 2007, Comput. Math. Appl..

[7]  Shie Mannor,et al.  Robustness and generalization , 2010, Machine Learning.

[8]  Zongben Xu,et al.  The generalization performance of ERM algorithm with strongly mixing observations , 2009, Machine Learning.

[9]  Manfred K. Warmuth,et al.  The weighted majority algorithm , 1989, 30th Annual Symposium on Foundations of Computer Science.

[10]  Ding-Xuan Zhou,et al.  Capacity of reproducing kernel spaces in learning theory , 2003, IEEE Transactions on Information Theory.

[11]  Yishay Mansour,et al.  Why averaging classifiers can protect against overfitting , 2001, AISTATS.

[12]  Don R. Hush,et al.  Learning from dependent observations , 2007, J. Multivar. Anal..

[13]  D. Mackay,et al.  Bayesian methods for adaptive models , 1992 .

[14]  Sally A. Goldman,et al.  Can PAC learning algorithms tolerate random attribute noise? , 1995, Algorithmica.

[15]  Manfred K. Warmuth,et al.  Averaging Expert Predictions , 1999, EuroCOLT.

[16]  David Gamarnik Extension of the PAC framework to finite and countable Markov chains , 2003, IEEE Trans. Inf. Theory.

[17]  M. Vidyasagar Convergence of Empirical Means with Alpha-Mixing Input Sequences, and an Application to PAC Learning , 2005, Proceedings of the 44th IEEE Conference on Decision and Control.

[18]  Javed A. Aslam,et al.  General bounds on statistical query learning and PAC learning with noise via hypothesis boosting , 1993, Proceedings of 1993 IEEE 34th Annual Foundations of Computer Science.

[19]  M. Mohri,et al.  Stability Bounds for Stationary φ-mixing and β-mixing Processes , 2010 .

[20]  Y. Mansour,et al.  Generalization bounds for averaged classifiers , 2004, math/0410092.

[21]  Trevor Hastie,et al.  An Introduction to Statistical Learning , 2013, Springer Texts in Statistics.

[22]  Zongben Xu,et al.  Generalization bounds of ERM algorithm with V-geometrically Ergodic Markov chains , 2012, Adv. Comput. Math..

[23]  Zongben Xu,et al.  Learning from uniformly ergodic Markov chains , 2009, J. Complex..

[24]  Stephen P. Brooks,et al.  Markov chain Monte Carlo method and its application , 1998 .

[25]  Michael Kearns,et al.  Efficient noise-tolerant learning from statistical queries , 1993, STOC.

[26]  Gábor Lugosi,et al.  Introduction to Statistical Learning Theory , 2004, Advanced Lectures on Machine Learning.

[27]  Cécile Ané,et al.  Analysis of comparative data with hierarchical autocorrelation , 2008, 0804.3166.