Learning from dependent observations

In most papers establishing consistency for learning algorithms it is assumed that the observations used for training are realizations of an i.i.d. process. In this paper we go far beyond this classical framework by showing that support vector machines (SVMs) only require that the data-generating process satisfies a certain law of large numbers. We then consider the learnability of SVMs for @a-mixing (not necessarily stationary) processes for both classification and regression, where for the latter we explicitly allow unbounded noise.

[1]  A. Irle On Consistency in Nonparametric Estimation under Mixing Conditions , 1997 .

[2]  Garrett Stuck,et al.  Introduction to Dynamical Systems: Introduction , 2002 .

[3]  J. Doob Stochastic processes , 1953 .

[4]  Joram Lindenstrauss,et al.  Classical Banach spaces I: Sequence Spaces. , 1977 .

[5]  N. Etemadi An elementary proof of the strong law of large numbers , 1981 .

[6]  Michael I. Jordan,et al.  Convexity, Classification, and Risk Bounds , 2006 .

[7]  László Györfi,et al.  A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.

[8]  R. Gray,et al.  Asymptotically Mean Stationary Measures , 1980 .

[9]  Ingo Steinwart,et al.  Fast Rates for Support Vector Machines , 2005, COLT.

[10]  P. Protter,et al.  The Laws of Large Numbers , 2000 .

[11]  P. Massart,et al.  Statistical performance of support vector machines , 2008, 0804.0551.

[12]  Svante Janson,et al.  Remarks on the Foundations of Measures of Dependence. , 1985 .

[13]  Edward C. Waymire,et al.  Iterated random maps and some classes of Markov processes , 2001 .

[14]  Mathukumalli Vidyasagar,et al.  A Theory of Learning and Generalization: With Applications to Neural Networks and Control Systems , 1997 .

[15]  Hans Ulrich Simon,et al.  Proceedings of the 19th annual conference on Learning Theory , 2006 .

[16]  Robert B. Ash,et al.  Probability & Measure Theory , 1999 .

[17]  R. C. Bradley Basic properties of strong mixing conditions. A survey and some open questions , 2005, math/0511078.

[18]  C. Caramanis What is ergodic theory , 1963 .

[19]  Dharmendra S. Modha,et al.  Memory-Universal Prediction of Stationary Random Processes , 1998, IEEE Trans. Inf. Theory.

[20]  Ingo Steinwart,et al.  Consistency and robustness of kernel-based regression in convex risk minimization , 2007, 0709.0626.

[21]  Madan L. Puri,et al.  New perspectives in theoretical and applied statistics , 1986 .

[22]  Adam Krzyzak,et al.  A Distribution-Free Theory of Nonparametric Regression , 2002, Springer series in statistics.

[23]  W. Philipp,et al.  Almost Sure Invariance Principles for Weakly Dependent Vector-Valued Random Variables , 1982 .

[24]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[25]  Ingo Steinwart,et al.  Fast rates for support vector machines using Gaussian kernels , 2007, 0708.1838.

[26]  Lorenzo Rosasco,et al.  Some Properties of Regularized Kernel Methods , 2004, J. Mach. Learn. Res..

[27]  Ingo Steinwart,et al.  On the Influence of the Kernel on the Consistency of Support Vector Machines , 2002, J. Mach. Learn. Res..

[28]  J. Schwartz,et al.  Linear Operators. Part I: General Theory. , 1960 .

[29]  Wolfgang Härdle,et al.  Nonparametric Curve Estimation from Time Series , 1989 .

[30]  Denis Bosq,et al.  Nonparametric statistics for stochastic processes , 1996 .

[31]  Ingo Steinwart,et al.  Consistency of support vector machines and other regularized kernel classifiers , 2005, IEEE Transactions on Information Theory.

[32]  Richard C. Bradley,et al.  Introduction to strong mixing conditions , 2007 .

[33]  R. C. Bradley Basic Properties of Strong Mixing Conditions , 1985 .

[34]  H. Bauer Measure and integration theory , 2001 .

[35]  Sanjeev R. Kulkarni,et al.  Convergence and Consistency of Regularized Boosting Algorithms with Stationary B-Mixing Observations , 2005, NIPS.

[36]  Tong Zhang Statistical behavior and consistency of classification methods based on convex risk minimization , 2003 .

[37]  Yiming Ying,et al.  Support Vector Machine Soft Margin Classifiers: Error Analysis , 2004, J. Mach. Learn. Res..

[38]  A. Nobel Limits to classification and regression estimation from ergodic processes , 1999 .

[39]  Ingo Steinwart,et al.  Consistency and robustness of kernel based regression , 2005 .

[40]  Peter Auer,et al.  Proceedings of the 18th annual conference on Learning Theory , 2005 .

[41]  Nello Cristianini,et al.  An introduction to Support Vector Machines , 2000 .

[42]  Ron Meir,et al.  Nonparametric Time Series Prediction Through Adaptive Model Selection , 2000, Machine Learning.

[43]  F. Smithies Linear Operators , 2019, Nature.

[44]  N. Aronszajn Theory of Reproducing Kernels. , 1950 .

[45]  Don R. Hush,et al.  Function Classes That Approximate the Bayes Risk , 2006, COLT.

[46]  Zhengyan Lin,et al.  Limit Theory for Mixing Dependent Random Variables , 1997 .

[47]  D. Vere-Jones Markov Chains , 1972, Nature.

[48]  Ingo Steinwart,et al.  A new concentration result for regularized risk minimizers , 2006, math/0612779.

[49]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[50]  Ingo Steinwart,et al.  Support Vector Machines are Universally Consistent , 2002, J. Complex..

[51]  Philip D. Plowright,et al.  Convexity , 2019, Optimization for Chemical and Biochemical Engineering.

[52]  Jianqing Fan Nonlinear Time Series , 2003 .

[53]  Andrew Rosalsky,et al.  Laws of Large Numbers , 2011, International Encyclopedia of Statistical Science.

[54]  Vladimir Koltchinskii,et al.  Exponential Convergence Rates in Classification , 2005, COLT.