Learning from uniformly ergodic Markov chains

Evaluation for generalization performance of learning algorithms has been the main thread of machine learning theoretical research. The previous bounds describing the generalization performance of the empirical risk minimization (ERM) algorithm are usually established based on independent and identically distributed (i.i.d.) samples. In this paper we go far beyond this classical framework by establishing the generalization bounds of the ERM algorithm with uniformly ergodic Markov chain (u.e.M.c.) samples. We prove the bounds on the rate of uniform convergence/relative uniform convergence of the ERM algorithm with u.e.M.c. samples, and show that the ERM algorithm with u.e.M.c. samples is consistent. The established theory underlies application of ERM type of learning algorithms.

[1]  Mathukumalli Vidyasagar,et al.  A Theory of Learning and Generalization , 1997 .

[2]  Yiming Ying,et al.  Support Vector Machine Soft Margin Classifiers: Error Analysis , 2004, J. Mach. Learn. Res..

[3]  S. Smale,et al.  ONLINE LEARNING WITH MARKOV SAMPLING , 2009 .

[4]  Peter Winkler,et al.  Mixing times for uniformly ergodic Markov chains , 1997 .

[5]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[6]  Luoqing Li,et al.  The performance bounds of learning machines based on exponentially strongly mixing sequences , 2007, Comput. Math. Appl..

[7]  Felipe Cucker,et al.  Learning Theory: An Approximation Theory Viewpoint: Index , 2007 .

[8]  Richard L. Tweedie,et al.  Markov Chains and Stochastic Stability , 1993, Communications and Control Engineering Series.

[9]  P. Glynn,et al.  Hoeffding's inequality for uniformly ergodic Markov chains , 2002 .

[10]  Mathukumalli Vidyasagar,et al.  Learning and Generalization: With Applications to Neural Networks , 2002 .

[11]  Felipe Cucker,et al.  On the mathematical foundations of learning , 2001 .

[12]  By W. R. GILKSt,et al.  Adaptive Rejection Sampling for Gibbs Sampling , 2010 .

[13]  Zongben Xu,et al.  The generalization performance of ERM algorithm with strongly mixing observations , 2009, Machine Learning.

[14]  Don R. Hush,et al.  Learning from dependent observations , 2007, J. Multivar. Anal..

[15]  Galin L. Jones On the Markov chain central limit theorem , 2004, math/0409112.

[16]  S. Smale,et al.  Shannon sampling and function reconstruction from point values , 2004 .

[17]  Bin Yu RATES OF CONVERGENCE FOR EMPIRICAL PROCESSES OF STATIONARY MIXING SEQUENCES , 1994 .

[18]  Felipe Cucker,et al.  Learning Theory: An Approximation Theory Viewpoint (Cambridge Monographs on Applied & Computational Mathematics) , 2007 .

[19]  E. Masry,et al.  Minimum complexity regression estimation with weakly dependent observations , 1996, Proceedings of 1994 Workshop on Information Theory and Statistics.

[20]  Ding-Xuan Zhou,et al.  Capacity of reproducing kernel spaces in learning theory , 2003, IEEE Transactions on Information Theory.

[21]  O. Bousquet New approaches to statistical learning theory , 2003 .

[22]  Felipe Cucker,et al.  Best Choices for Regularization Parameters in Learning Theory: On the Bias—Variance Problem , 2002, Found. Comput. Math..

[23]  Nando de Freitas,et al.  An Introduction to MCMC for Machine Learning , 2004, Machine Learning.

[24]  W. K. Hastings,et al.  Monte Carlo Sampling Methods Using Markov Chains and Their Applications , 1970 .

[25]  Peter W. Glynn,et al.  Stationarity detection in the initial transient problem , 1992, TOMC.

[26]  László Györfi,et al.  A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.