Pattern Recognition for Conditionally Independent Data

In this work we consider the task of relaxing the i.i.d. assumption in pattern recognition (or classification), aiming to make existing learning algorithms applicable to a wider range of tasks. Pattern recognition is guessing a discrete label of some object based on a set of given examples (pairs of objects and labels). We consider the case of deterministically defined labels. Traditionally, this task is studied under the assumption that examples are independent and identically distributed. However, it turns out that many results of pattern recognition theory carry over a weaker assumption. Namely, under the assumption of conditional independence and identical distribution of objects, while the only assumption on the distribution of labels is that the rate of occurrence of each label should be above some positive threshold. We find a broad class of learning algorithms for which estimations of the probability of the classification error achieved under the classical i.i.d. assumption can be generalized to the similar estimates for case of conditionally i.i.d. examples.

[1]  Shun-ichi Amari,et al.  A Theory of Pattern Recognition , 1968 .

[2]  W. Rogers,et al.  A Finite Sample Distribution-Free Performance Bound for Local Discrimination Rules , 1978 .

[3]  A. Dawid Conditional Independence in Statistical Theory , 1979 .

[4]  L. Devroye On the Asymptotic Probability of Error in Nonparametric Discrimination , 1981 .

[5]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, STOC '84.

[6]  David Haussler,et al.  What Size Net Gives Valid Generalization? , 1989, Neural Computation.

[7]  David Haussler,et al.  Learnability and the Vapnik-Chervonenkis dimension , 1989, JACM.

[8]  A Markovian extension of Valiant's learning model , 1990, Proceedings [1990] 31st Annual Symposium on Foundations of Computer Science.

[9]  Philip M. Long,et al.  Tracking drifting concepts using random examples , 1991, Annual Conference Computational Learning Theory.

[10]  Umesh V. Vazirani,et al.  An Introduction to Computational Learning Theory , 1994 .

[11]  G. Lugosi,et al.  On the Strong Universal Consistency of Nearest Neighbor Regression Function Estimates , 1994 .

[12]  Gábor Lugosi,et al.  Nonparametric estimation via empirical risk minimization , 1995, IEEE Trans. Inf. Theory.

[13]  Sanjeev R. Kulkarni,et al.  Rates of convergence of nearest neighbor estimation under arbitrary sampling , 1995, IEEE Trans. Inf. Theory.

[14]  Umesh V. Vazirani,et al.  A Markovian extension of Valiant's learning model , 1990, Proceedings [1990] 31st Annual Symposium on Foundations of Computer Science.

[15]  László Györfi,et al.  A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.

[16]  Shai Ben-David,et al.  Learning Changing Concepts by Exploiting the Structure of Change , 1996, COLT '96.

[17]  G. Lugosi,et al.  Consistency of Data-driven Histogram Methods for Density Estimation and Classification , 1996 .

[18]  Sidney J. Yakowitz,et al.  Weakly convergent nonparametric forecasting of stationary time series , 1997, IEEE Trans. Inf. Theory.

[19]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[20]  Dana Ron,et al.  Algorithmic Stability and Sanity-Check Bounds for Leave-One-Out Cross-Validation , 1997, Neural Computation.

[21]  Paul H. Algoet,et al.  Universal Schemes for Learning the Best Nonlinear Predictor Given the Infinite Past and Side Information , 1999, IEEE Trans. Inf. Theory.

[22]  David Gamarnik,et al.  Extension of the PAC framework to finite and countable Markov chains , 1999, COLT '99.

[23]  László Györfi,et al.  A simple randomized algorithm for sequential prediction of ergodic time series , 1999, IEEE Trans. Inf. Theory.

[24]  A. Nobel Limits to classification and regression estimation from ergodic processes , 1999 .

[25]  M. Kearns,et al.  Algorithmic stability and sanity-check bounds for leave-one-out cross-validation , 1999 .

[26]  Sanjeev R. Kulkarni,et al.  Data-dependent kn-NN and kernel estimators consistent for arbitrary processes , 2002, IEEE Trans. Inf. Theory.

[27]  André Elisseeff,et al.  Stability and Generalization , 2002, J. Mach. Learn. Res..

[28]  Philip M. Long,et al.  Tracking drifting concepts by minimizing disagreements , 2004, Machine Learning.

[29]  Daniil Ryabko Online learning of conditionally I.I.D. data , 2004, ICML '04.

[30]  Daniil Ryabko Application of Classical Nonparametric Predictors to Learning Conditionally I.I.D. Data , 2004, ALT.

[31]  Sanjeev R. Kulkarni,et al.  Regression estimation from an individual stable sequence , 2007, ArXiv.