Toward efficient agnostic learning

In this paper we initiate an investigation of generalizations of the Probably Approximately Correct (PAC) learning model that attempt to significantly weaken the target function assumptions. The ultimate goal in this direction is informally termed agnostic learning, in which we make virtually no assumptions on the target function. The name derives from the fact that as designers of learning algorithms, we give up the belief that Nature (as represented by the target function) has a simple or succinct explanation. We give a number of both positive and negative results that provide an initial outline of the possibilities for agnostic learning. Our results include hardness results for the most obvious generalization of the PAC model to an agnostic setting, an efficient and general agnostic learning method based on dynamic programming, relationships between loss functions for agnostic learning, and an algorithm for learning in a model for problems involving hidden variables.

[1]  W. Hoeffding Probability Inequalities for sums of Bounded Random Variables , 1963 .

[2]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[3]  R. Dudley Central Limit Theorems for Empirical Measures , 1978 .

[4]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[5]  M. Garey Johnson: computers and intractability: a guide to the theory of np- completeness (freeman , 1979 .

[6]  D. Pollard Convergence of stochastic processes , 1984 .

[7]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, STOC '84.

[8]  Leslie G. Valiant,et al.  Learning Disjunction of Conjunctions , 1985, IJCAI.

[9]  Leslie G. Valiant,et al.  On the learnability of Boolean formulae , 1987, STOC.

[10]  Leslie G. Valiant,et al.  Computational limitations on learning from examples , 1988, JACM.

[11]  M. Kearns,et al.  Crytographic limitations on learning Boolean formulae and finite automata , 1989, STOC '89.

[12]  Halbert White,et al.  Learning in Artificial Neural Networks: A Statistical Perspective , 1989, Neural Computation.

[13]  David Haussler,et al.  Learnability and the Vapnik-Chervonenkis dimension , 1989, JACM.

[14]  A Markovian extension of Valiant's learning model , 1990, Proceedings [1990] 31st Annual Symposium on Foundations of Computer Science.

[15]  Kenji Yamanishi,et al.  A learning criterion for stochastic rules , 1990, COLT '90.

[16]  A. Izenman Review Papers: Recent Developments in Nonparametric Density Estimation , 1991 .

[17]  Philip M. Long,et al.  Tracking drifting concepts using random examples , 1991, Annual Conference Computational Learning Theory.

[18]  A. Izenman Recent Developments in Nonparametric Density Estimation , 1991 .

[19]  Jorma Rissanen,et al.  Density estimation by stochastic complexity , 1992, IEEE Trans. Inf. Theory.

[20]  Yoav Freund,et al.  An improved boosting algorithm and its implications on learning complexity , 1992, COLT '92.

[21]  Avrim Blum,et al.  Learning switching concepts , 1992, COLT '92.

[22]  David Haussler,et al.  Decision Theoretic Generalizations of the PAC Model for Neural Net and Other Learning Applications , 1992, Inf. Comput..

[23]  Ming Li,et al.  Learning in the Presence of Malicious Errors , 1993, SIAM J. Comput..

[24]  Noam Nisan,et al.  Constant depth circuits, Fourier transform, and learnability , 1993, JACM.

[25]  Leslie G. Valiant,et al.  Cryptographic limitations on learning Boolean formulae and finite automata , 1994, JACM.

[26]  Robert E. Schapire,et al.  Efficient distribution-free learning of probabilistic concepts , 1990, Proceedings [1990] 31st Annual Symposium on Foundations of Computer Science.