Scale-sensitive dimensions, uniform convergence, and learnability

Learnability in Valiant's PAC learning model has been shown to be strongly related to the existence of uniform laws of large numbers. These laws define a distribution-free convergence property of means to expectations uniformly over classes of random variables. Classes of real-valued functions enjoying such a property are also known as uniform Glivenko-Cantelli classes. In this paper, we prove, through a generalization of Sauer's lemma that may be interesting in its own right, a new characterization of uniform Glivenko-Cantelli classes. Our characterization yields Dudley, Gine´, and Zinn's previous characterization as a corollary. Furthermore, it is the first based on a Gine´, and Zinn's previous characterization as a corollary. Furthermore, it is the first based on a simple combinatorial quantity generalizing the Vapnik-Chervonenkis dimension. We apply this result to obtain the weakest combinatorial condition known to imply PAC learnability in the statistical regression (or “agnostic”) framework. Furthermore, we find a characterization of learnability in the probabilistic concept model, solving an open problem posed by Kearns and Schapire. These results show that the accuracy parameter plays a crucial role in determining the effective complexity of the learner's hypothesis class.

[1]  Vladimir Vapnik,et al.  Chervonenkis: On the uniform convergence of relative frequencies of events to their probabilities , 1971 .

[2]  Norbert Sauer,et al.  On the Density of Families of Sets , 1972, J. Comb. Theory, Ser. A.

[3]  S. Shelah A combinatorial problem; stability and order for models and theories in infinitary languages. , 1972 .

[4]  J. Rissanen,et al.  Modeling By Shortest Data Description* , 1978, Autom..

[5]  V. Vapnik,et al.  Necessary and Sufficient Conditions for the Uniform Convergence of Means to their Expectations , 1982 .

[6]  Vladimir Vapnik,et al.  Estimation of Dependences Based on Empirical Data: Springer Series in Statistics (Springer Series in Statistics) , 1982 .

[7]  V. Milman Some remarks about embeddings ofl1k in finite-dimensional spaces , 1982 .

[8]  N. Alon,et al.  Embedding ofl∞k in finite dimensional Banach spaces , 1983 .

[9]  R. Dudley A course on empirical processes , 1984 .

[10]  E. Giné,et al.  Some Limit Theorems for Empirical Processes , 1984 .

[11]  J. H. Lint {0,1,*} distance problems in combinatorics , 1985 .

[12]  Peter W. Shor,et al.  A lower bound for 0, 1, * tournament codes , 1987, Discret. Math..

[13]  Vladimir Vapnik,et al.  Inductive principles of the search for empirical dependences (methods based on weak convergence of probability measures) , 1989, COLT '89.

[14]  David Haussler,et al.  Learnability and the Vapnik-Chervonenkis dimension , 1989, JACM.

[15]  D. Pollard Empirical Processes: Theory and Applications , 1990 .

[16]  Isabelle Guyon,et al.  Structural Risk Minimization for Character Recognition , 1991, NIPS.

[17]  R. Dudley,et al.  Uniform and universal Glivenko-Cantelli classes , 1991 .

[18]  B. M. Fulk MATH , 1992 .

[19]  Shai Ben-David,et al.  Characterizations of learnability for classes of {O, …, n}-valued functions , 1992, COLT '92.

[20]  David Haussler,et al.  Decision Theoretic Generalizations of the PAC Model for Neural Net and Other Learning Applications , 1992, Inf. Comput..

[21]  Hans Ulrich Simon,et al.  Bounds on the Number of Examples Needed for Learning Functions , 1994, SIAM J. Comput..

[22]  Robert E. Schapire,et al.  Efficient distribution-free learning of probabilistic concepts , 1990, Proceedings [1990] 31st Annual Symposium on Foundations of Computer Science.

[23]  Philip M. Long,et al.  Fat-shattering and the learnability of real-valued functions , 1994, COLT '94.

[24]  Philip M. Long,et al.  A Generalization of Sauer's Lemma , 1995, J. Comb. Theory, Ser. A.

[25]  Philip M. Long,et al.  More theorems about scale-sensitive dimensions and learning , 1995, COLT '95.

[26]  Philip M. Long,et al.  Characterizations of Learnability for Classes of {0, ..., n}-Valued Functions , 1995, J. Comput. Syst. Sci..