Learning in the presence of malicious errors

We study a practical extension to the Valiant model of machine learning from examples [v84]: the presence of errors, possibly maliciously generated by an adversary, in the sample data. Recent papers have made progress in the Valiant model by providing algorithms for learning various classes of functions, by giving evidence for the intractability of learning other classes, and by developing general tools and techniques for determining learnability (see e.g. (BEHW86], [KLPV87], (R.871). These results assurne an error-free oracle for examples of the function being learned. In many environments, however, there is always some chance that an enoneous example is given to the learning algorithm, In a training session for an expert system, this might be due to an occasionally faulty teacher; in settings where the examples are being transmitted electronically, it might be due to unreliable communication equipment. Since one of the strengths of the Valiant model is the lack of assumptions on the probability distribution from which examples are drawn, we seek to preserve thin -. _ This rf-.uenrch was 3upportcd by NSP grant DCR8606366 and ONIS gran(s NOOOII-P.s.K-044:, nml NO0014-d&K-04n4. M. lic~nb is also sr~pportrd by an A ‘P. Sr ‘r. Rell Laborstorice Scholarship. Part of (his rrscnrch was (lunc while M. Kesrno wns visiting the [l,liversity of California at Santa Crux. Authors’ current addresses: M. Kesrns, Aiken Computation Laboratory, Harvard University. Cambridge, MA 02138 and M. Li, Dept. of Computer Science, York University, North York, Ontario MSJ lP3 Canada.

[1]  H. Chernoff A Measure of Asymptotic Efficiency for Tests of a Hypothesis Based on the sum of Observations , 1952 .

[2]  Vladimir Vapnik,et al.  Chervonenkis: On the uniform convergence of relative frequencies of events to their probabilities , 1971 .

[3]  David S. Johnson,et al.  Approximation algorithms for combinatorial problems , 1973, STOC.

[4]  László Lovász,et al.  On the ratio of optimal integral and fractional covers , 1975, Discret. Math..

[5]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[6]  Vasek Chvátal,et al.  A Greedy Heuristic for the Set-Covering Problem , 1979, Math. Oper. Res..

[7]  Leslie G. Valiant,et al.  Fast probabilistic algorithms for hamiltonian circuits and matchings , 1977, STOC '77.

[8]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, STOC '84.

[9]  David Haussler,et al.  Classifying learnable geometric concepts with the Vapnik-Chervonenkis dimension , 1986, STOC '86.

[10]  Ronald L. Rivest,et al.  Learning decision lists , 2004, Machine Learning.

[11]  Leslie G. Valiant,et al.  On the learnability of Boolean formulae , 1987, STOC.

[12]  David Haussler,et al.  Occam's Razor , 1987, Inf. Process. Lett..

[13]  N. Littlestone Learning Quickly When Irrelevant Attributes Abound: A New Linear-Threshold Algorithm , 1987, 28th Annual Symposium on Foundations of Computer Science (sfcs 1987).

[14]  George Shackelford,et al.  Learning k-DNF with noise in the attributes , 1988, Annual Conference Computational Learning Theory.

[15]  P. Laird Learning from Good and Bad Data , 1988 .

[16]  Leslie G. Valiant,et al.  A general lower bound on the number of examples needed for learning , 1988, COLT '88.

[17]  David Haussler,et al.  Equivalence of models for polynomial learnability , 1988, COLT '88.

[18]  David Haussler,et al.  Learnability and the Vapnik-Chervonenkis dimension , 1989, JACM.

[19]  Leonard Pitt,et al.  A polynomial-time algorithm for learning k-variable pattern languages from examples , 1989, COLT '89.

[20]  Karsten A. Verbeurgt Learning DNF under the uniform distribution in quasi-polynomial time , 1990, COLT '90.