Learnability and the Vapnik-Chervonenkis dimension

Valiant's learnability model is extended to learning classes of concepts defined by regions in Euclidean space En. The methods in this paper lead to a unified treatment of some of Valiant's results, along with previous results on distribution-free convergence of certain pattern recognition algorithms. It is shown that the essential condition for distribution-free learnability is finiteness of the Vapnik-Chervonenkis dimension, a simple combinatorial parameter of the class of concepts to be learned. Using this parameter, the complexity and closure properties of learnable classes are analyzed, and the necessary and sufficient conditions are provided for feasible learnability.

[1]  Thomas M. Cover,et al.  Geometrical and Statistical Properties of Systems of Linear Inequalities with Applications in Pattern Recognition , 1965, IEEE Trans. Electron. Comput..

[2]  Donald Ervin Knuth,et al.  The Art of Computer Programming , 1968 .

[3]  Patrick Henry Winston,et al.  Learning structural descriptions from examples , 1970 .

[4]  Vladimir Vapnik,et al.  Chervonenkis: On the uniform convergence of relative frequencies of events to their probabilities , 1971 .

[5]  Saburo Muroga,et al.  Threshold logic and its applications , 1971 .

[6]  Satosi Watanabe,et al.  PATTERN RECOGNITION AS INFORMATION COMPRESSION , 1972 .

[7]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[8]  Alfred V. Aho,et al.  The Design and Analysis of Computer Algorithms , 1974 .

[9]  David S. Johnson,et al.  Approximation algorithms for combinatorial problems , 1973, STOC.

[10]  Luc Devroye,et al.  A distribution-free performance bound in error estimation (Corresp.) , 1976, IEEE Trans. Inf. Theory.

[11]  John Gill,et al.  Computational Complexity of Probabilistic Turing Machines , 1977, SIAM J. Comput..

[12]  Judea Pearl,et al.  ON THE CONNECTION BETWEEN THE COMPLEXITY AND CREDIBILITY OF INFERRED MODELS , 1978 .

[13]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[14]  Judea Pearl,et al.  Capacity and Error Estimates for Boolean Classifiers with Limited Complexity , 1979, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Leslie G. Valiant,et al.  Fast probabilistic algorithms for hamiltonian circuits and matchings , 1977, STOC '77.

[16]  室 章治郎 Michael R.Garey/David S.Johnson 著, "COMPUTERS AND INTRACTABILITY A guide to the Theory of NP-Completeness", FREEMAN, A5判変形判, 338+xii, \5,217, 1979 , 1980 .

[17]  Temple F. Smith Occam's razor , 1980, Nature.

[18]  Richard M. Dudley,et al.  Some special vapnik-chervonenkis classes , 1981, Discret. Math..

[19]  Vladimir Vapnik,et al.  Estimation of Dependences Based on Empirical Data: Springer Series in Statistics (Springer Series in Statistics) , 1982 .

[20]  D. Pollard Convergence of stochastic processes , 1984 .

[21]  R. Dudley A course on empirical processes , 1984 .

[22]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, CACM.

[23]  D. T. Lee,et al.  Computational Geometry—A Survey , 1984, IEEE Transactions on Computers.

[24]  Nimrod Megiddo,et al.  Linear Programming in Linear Time When the Dimension Is Fixed , 1984, JACM.

[25]  Narendra Karmarkar,et al.  A new polynomial-time algorithm for linear programming , 1984, Comb..

[26]  Silvio Micali,et al.  How to construct random functions , 1986, JACM.

[27]  David Haussler,et al.  Epsilon-nets and simplex range queries , 1986, SCG '86.

[28]  E. Giné,et al.  Lectures on the central limit theorem for empirical processes , 1986 .

[29]  B. K. Natarajan Learning Functions from Examples , 1987 .

[30]  Balas K. Natarajan,et al.  On learning Boolean functions , 1987, STOC.

[31]  Dana Angluin,et al.  Learning Regular Sets from Queries and Counterexamples , 1987, Inf. Comput..

[32]  Leslie G. Valiant,et al.  On the learnability of Boolean formulae , 1987, STOC.

[33]  Philip D. Laird,et al.  Learning from good data and bad , 1987 .

[34]  N. Littlestone Learning Quickly When Irrelevant Attributes Abound: A New Linear-Threshold Algorithm , 1987, 28th Annual Symposium on Foundations of Computer Science (sfcs 1987).

[35]  R. Dudley Universal Donsker Classes and Metric Entropy , 1987 .

[36]  David Haussler,et al.  ɛ-nets and simplex range queries , 1987, Discret. Comput. Geom..

[37]  M. Kearns,et al.  Recent Results on Boolean Concept Learning , 1987 .

[38]  Ming Li,et al.  Learning in the presence of malicious errors , 1993, STOC '88.

[39]  David Haussler,et al.  Quantifying Inductive Bias: AI Learning Algorithms and Valiant's Learning Framework , 1988, Artif. Intell..

[40]  Leonard Pitt,et al.  Reductions among prediction problems: on the difficulty of predicting automata , 1988, [1988] Proceedings. Structure in Complexity Theory Third Annual Conference.

[41]  Herbert Edelsbrunner,et al.  Minimum Polygonal Separation , 1988, Inf. Comput..

[42]  Jeffrey Scott Vitter,et al.  Learning in parallel , 1988, COLT '88.

[43]  D. Angluin Queries and Concept Learning , 1988 .

[44]  Nimrod Megiddo,et al.  On the complexity of polyhedral separability , 1988, Discret. Comput. Geom..

[45]  Nathan Linial,et al.  Results on learnability and the Vapnik-Chervonenkis dimension , 1988, [Proceedings 1988] 29th Annual Symposium on Foundations of Computer Science.

[46]  Leslie G. Valiant,et al.  Computational limitations on learning from examples , 1988, JACM.

[47]  Leslie G. Valiant,et al.  A general lower bound on the number of examples needed for learning , 1988, COLT '88.

[48]  David Haussler,et al.  Predicting {0,1}-functions on randomly drawn points , 1988, COLT '88.

[49]  David Haussler,et al.  Equivalence of models for polynomial learnability , 1988, COLT '88.

[50]  Luc Devroye,et al.  Automatic Pattern Recognition: A Study of the Probability of Error , 1988, IEEE Trans. Pattern Anal. Mach. Intell..

[51]  Jean Sallantin,et al.  Some remarks about space-complexity of learning, and circuit complexity of recognizing , 1988, Annual Conference Computational Learning Theory.

[52]  Alon Itai,et al.  Nonuniform Learnability , 1988, J. Comput. Syst. Sci..

[53]  M. Kearns,et al.  Crytographic limitations on learning Boolean formulae and finite automata , 1989, STOC '89.

[54]  Yishay Mansour,et al.  A parametrization scheme for classifying models of learnability , 1989, COLT '89.

[55]  David Haussler,et al.  What Size Net Gives Valid Generalization? , 1989, Neural Computation.

[56]  Gyora M. Benedek,et al.  A parametrization scheme for classifying models of learnability , 1989, COLT '89.

[57]  Ronald L. Rivest,et al.  Inferring Decision Trees Using the Minimum Description Length Principle , 1989, Inf. Comput..

[58]  David Haussler,et al.  Generalizing the PAC model: sample size bounds from metric dimension-based uniform convergence results , 1989, 30th Annual Symposium on Foundations of Computer Science.

[59]  D. Haussler Learning Conjunctive Concepts in Structural Domains , 1989 .

[60]  David Haussler,et al.  Applying valiant's learning framework to Al concept-learning problems , 1990 .