Probably Approximately Correct Learning

This paper surveys some recent theoretical results on the efficiency of machine learning algorithms. The main tool described is the notion of Probably Approximately Correct (PAC) learning, introduced by Valiant. We define this learning model and then look at some of the results obtained in it. We then consider some criticisms of the PAC model and the extensions proposed to address these criticisms. Finally, we look briefly at other models recently proposed in computational learning theory.

[1]  S. E. Hampson,et al.  Linear function neurons: Structure and training , 1986, Biological Cybernetics.

[2]  David Haussler,et al.  Proceedings of the 1988 Workshop on Computational Learning Theory : MIT, August 3-5, 1988 , 1989 .

[3]  Ronald L. Rivest,et al.  Training a 3-node neural network is NP-complete , 1988, COLT '88.

[4]  Dana Angluin,et al.  Learning Regular Sets from Queries and Counterexamples , 1987, Inf. Comput..

[5]  Michael J. Pazzani,et al.  Average case analysis of empirical and explanation-based learning algorithms , 1989 .

[6]  Michael Frazier,et al.  Learning conjunctions of Horn clauses , 1990, Proceedings [1990] 31st Annual Symposium on Foundations of Computer Science.

[7]  N. Littlestone Mistake bounds and logarithmic linear-threshold learning algorithms , 1990 .

[8]  N. Littlestone Learning Quickly When Irrelevant Attributes Abound: A New Linear-Threshold Algorithm , 1987, 28th Annual Symposium on Foundations of Computer Science (sfcs 1987).

[9]  Leslie G. Valiant,et al.  On the learnability of Boolean formulae , 1987, STOC.

[10]  V. Vapnik Estimation of Dependences Based on Empirical Data , 2006 .

[11]  John Shawe-Taylor,et al.  Bounding Sample Size with the Vapnik-Chervonenkis Dimension , 1993, Discrete Applied Mathematics.

[12]  Ming Li,et al.  Learning in the presence of malicious errors , 1993, STOC '88.

[13]  Nick Littlestone,et al.  From on-line to batch learning , 1989, COLT '89.

[14]  David Haussler,et al.  Occam's Razor , 1987, Inf. Process. Lett..

[15]  Manfred K. Warmuth,et al.  The weighted majority algorithm , 1989, 30th Annual Symposium on Foundations of Computer Science.

[16]  Marek Karpinski,et al.  Learning read-once formulas with queries , 1993, JACM.

[17]  B. Natarajan On learning sets and functions , 2004, Machine Learning.

[18]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[19]  Robert E. Schapire,et al.  Efficient distribution-free learning of probabilistic concepts , 1990, Proceedings [1990] 31st Annual Symposium on Foundations of Computer Science.

[20]  Leslie G. Valiant,et al.  Computational limitations on learning from examples , 1988, JACM.

[21]  Leonard Pitt,et al.  Inductive Inference, DFAs, and Computational Complexity , 1989, AII.

[22]  David Haussler,et al.  Predicting {0,1}-functions on randomly drawn points , 1988, COLT '88.

[23]  D. Angluin Queries and Concept Learning , 1988 .

[24]  David Haussler,et al.  Learnability and the Vapnik-Chervonenkis dimension , 1989, JACM.

[25]  Eric B. Baum,et al.  When Are k-Nearest Neighbor and Back Propagation Accurate for Feasible Sized Sets of Examples? , 1990, EURASIP Workshop.

[26]  Ronald L. Rivest,et al.  Learning decision lists , 2004, Machine Learning.

[27]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, STOC '84.

[28]  Leslie G. Valiant,et al.  Learning Disjunction of Conjunctions , 1985, IJCAI.

[29]  David Haussler,et al.  Learning Conjunctive Concepts in Structural Domains , 1989, Machine Learning.

[30]  KearnsMichael,et al.  Cryptographic limitations on learning Boolean formulae and finite automata , 1994 .

[31]  David Haussler,et al.  Quantifying Inductive Bias: AI Learning Algorithms and Valiant's Learning Framework , 1988, Artif. Intell..

[32]  David Haussler,et al.  Equivalence of models for polynomial learnability , 1988, COLT '88.

[33]  David Haussler,et al.  Decision Theoretic Generalizations of the PAC Model for Neural Net and Other Learning Applications , 1992, Inf. Comput..

[34]  Thomas M. Cover,et al.  Geometrical and Statistical Properties of Systems of Linear Inequalities with Applications in Pattern Recognition , 1965, IEEE Trans. Electron. Comput..

[35]  D. Angluin,et al.  Learning From Noisy Examples , 1988, Machine Learning.

[36]  Wray L. Buntine,et al.  A theory of learning classification rules , 1990 .