Most existing supervised machine learning frameworks assume there is no mistake or false interpretation on the training samples. However, this assumption may not be true in practical applications. In some cases, if human being is involved in providing training samples, there may be errors in the training set. In this paper, we study the effect of imperfect training samples on the supervised machine learning framework. We focus on the mathematical framework that describes the learnability of noisy training data. We study theorems to estimate the error bounds of generated models and the required amount of training samples. These errors are dependent on the amount of data trained and the probability of the accuracy of training data. Based on the effectiveness of learnability on imperfect annotation, we describe an autonomous learning framework, which uses cross-modality information to learn concept models. For instance, visual concept models can be trained based on the detection result of Automatic Speech Recognition, Closed Captions, or prior detection results of the same modality. Those detection results on an unsupervised training set serve as imperfect labeling for the models-to-build. A prototype system based on this learning technique has been built. Promising results have been shown on these experiments.
[1]
P. Laird.
Learning from Good and Bad Data
,
1988
.
[2]
Thorsten Joachims,et al.
Making large scale SVM learning practical
,
1998
.
[3]
W. Hoeffding.
Probability Inequalities for sums of Bounded Random Variables
,
1963
.
[4]
David H. Wolpert,et al.
The Relationship Between PAC, the Statistical Physics Framework, the Bayesian Framework, and the VC Framework
,
1995
.
[5]
Marti A. Hearst.
Trends & Controversies: Support Vector Machines
,
1998,
IEEE Intell. Syst..
[6]
Vladimir Vapnik,et al.
Chervonenkis: On the uniform convergence of relative frequencies of events to their probabilities
,
1971
.
[7]
Vladimir Vapnik,et al.
Statistical learning theory
,
1998
.
[8]
Donna K. Slonim,et al.
Learning with unreliable boundary queries
,
1995,
COLT '95.
[9]
D. Angluin,et al.
Learning From Noisy Examples
,
1988,
Machine Learning.
[10]
David Haussler,et al.
Learnability and the Vapnik-Chervonenkis dimension
,
1989,
JACM.
[11]
Bernhard E. Boser,et al.
A training algorithm for optimal margin classifiers
,
1992,
COLT '92.
[12]
V. Vapnik.
Estimation of Dependences Based on Empirical Data
,
2006
.