The basic question addressed in this paper is: how can a learning algorithm cope with incorrect training examples? Specifically, how can algorithms that produce an “approximately correct” identification with “high probability” for reliable data be adapted to handle noisy data? We show that when the teacher may make independent random errors in classifying the example data, the strategy of selecting the most consistent rule for the sample is sufficient, and usually requires a feasibly small number of examples, provided noise affects less than half the examples on average. In this setting we are able to estimate the rate of noise using only the knowledge that the rate is less than one half. The basic ideas extend to other types of random noise as well. We also show that the search problem associated with this strategy is intractable in general. However, for particular classes of rules the target rule may be efficiently identified if we use techniques specific to that class. For an important class of formulas—the k-CNF formulas studied by Valiant—we present a polynomial-time algorithm that identifies concepts in this form when the rate of classification errors is less than one half.
[1]
Leslie G. Valiant,et al.
On the learnability of Boolean formulae
,
1987,
STOC.
[2]
Leslie G. Valiant,et al.
A theory of the learnable
,
1984,
CACM.
[3]
Vladimir Vapnik,et al.
Estimation of Dependences Based on Empirical Data: Springer Series in Statistics (Springer Series in Statistics)
,
1982
.
[4]
David Haussler,et al.
Classifying learnable geometric concepts with the Vapnik-Chervonenkis dimension
,
1986,
STOC '86.
[5]
Leslie G. Valiant,et al.
Learning Disjunction of Conjunctions
,
1985,
IJCAI.
[6]
Ming Li,et al.
Learning in the presence of malicious errors
,
1993,
STOC '88.
[7]
W. Hoeffding.
Probability Inequalities for sums of Bounded Random Variables
,
1963
.
[8]
Richard Granger,et al.
Incremental Learning from Noisy Data
,
1986,
Machine Learning.
[9]
David C. Wilkins,et al.
On Debugging Rule Sets When Reasoning Under Uncertainty
,
1986,
AAAI.
[10]
Dana Angluin,et al.
Learning Regular Sets from Queries and Counterexamples
,
1987,
Inf. Comput..