Scale-sensitive dimensions, uniform convergence, and learnability

Learnability in Valiant's PAC learning model has been shown to be strongly related to the existence of uniform laws of large numbers. These laws define a distribution-free convergence property of means to expectations uniformly over classes of random variables. Classes of real-valued functions enjoying such a property are also known as uniform Gliveako-Cantelli classes. In this paper we prove, through a generalization of Sauer's lemma that may be interesting in its own right, a new characterization of uniform Glivenko-Cantelli classes. Our characterization yields Dudley, Gine, and Zinn's previous characterization as a corollary. Furthermore, it is the first based on a simple combinatorial quantity generalizing the Vapnik-Chervonenkis dimension. We apply this result to characterize PAC learnability in the statistical regression framework of probabilistic concepts, solving an open problem posed by Kearns and Schapire. Our characterization shows that the accuracy parameter plays a crucial role in determining the effective complexity of the learner's hypothesis class.<<ETX>>

[1]  Vladimir Vapnik,et al.  Chervonenkis: On the uniform convergence of relative frequencies of events to their probabilities , 1971 .

[2]  Norbert Sauer,et al.  On the Density of Families of Sets , 1972, J. Comb. Theory A.

[3]  S. Shelah A combinatorial problem; stability and order for models and theories in infinitary languages. , 1972 .

[4]  J. Rissanen,et al.  Modeling By Shortest Data Description* , 1978, Autom..

[5]  V. Milman Some remarks about embeddings ofl1k in finite-dimensional spaces , 1982 .

[6]  N. Alon,et al.  Embedding ofl∞k in finite dimensional Banach spaces , 1983 .

[7]  D. Pollard Convergence of stochastic processes , 1984 .

[8]  J. H. Lint {0,1,*} distance problems in combinatorics , 1985 .

[9]  Peter W. Shor,et al.  A lower bound for 0, 1, * tournament codes , 1987, Discret. Math..

[10]  David Haussler,et al.  Occam's Razor , 1987, Inf. Process. Lett..

[11]  Vladimir Vapnik,et al.  Inductive principles of the search for empirical dependences (methods based on weak convergence of probability measures) , 1989, COLT '89.

[12]  Isabelle Guyon,et al.  Structural Risk Minimization for Character Recognition , 1991, NIPS.

[13]  R. Dudley,et al.  Uniform and universal Glivenko-Cantelli classes , 1991 .

[14]  David Haussler,et al.  Decision Theoretic Generalizations of the PAC Model for Neural Net and Other Learning Applications , 1992, Inf. Comput..

[15]  Robert E. Schapire,et al.  Efficient distribution-free learning of probabilistic concepts , 1990, Proceedings [1990] 31st Annual Symposium on Foundations of Computer Science.

[16]  Philip M. Long,et al.  A Generalization of Sauer's Lemma , 1995, J. Comb. Theory A.