Improving object detection by removing noisy samples from training sets

Object detection is often formulated as a binary classification task with supervised learning that involves training datasets. Noisy samples, including mislabeled samples and ``hard-to-learn" samples, are usually found in training datasets. Such samples have a detrimental effect on the generalization performance of trained classifiers and are required to be pruned. In this paper, we propose a novel data pruning algorithm that is based on recursive Bayes approach and AdaBoost. Recursive Bayes approach increases the confidence of predictions in every iteration, while AdaBoost minimizes the number of predictions that have low confidence. Extensive experiments on real datasets show the effectiveness of the proposed algorithm in identifying and pruning noisy samples from training datasets and concurrently improving the performance of classification and object detection.

[1]  David G. Stork,et al.  Pattern classification, 2nd Edition , 2000 .

[2]  Ian Witten,et al.  Data Mining , 2000 .

[3]  David G. Stork,et al.  Pattern Classification , 1973 .

[4]  James M. Rehg,et al.  Linear Asymmetric Classifier for cascade detectors , 2005, ICML.

[5]  Victoria J. Hodge,et al.  A Survey of Outlier Detection Methodologies , 2004, Artificial Intelligence Review.

[6]  Yoav Freund,et al.  Boosting the margin: A new explanation for the effectiveness of voting methods , 1997, ICML.

[7]  Chih-Jen Lin,et al.  Trust Region Newton Method for Logistic Regression , 2008, J. Mach. Learn. Res..

[8]  Y. Abu-Mostafa,et al.  Generalization error estimates and training data valuation , 2002 .

[9]  Pietro Perona,et al.  Pruning training sets for learning of object categories , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[10]  Mariano Alvira,et al.  MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES A.I. Memo No.XXXX C.B.C.L Paper No.XXX An Empirical Comparison of SNoW and SVMs For Face Detection , 2001 .

[11]  Paul A. Viola,et al.  Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[12]  Hui Xiong,et al.  Enhancing data analysis with noise removal , 2006, IEEE Transactions on Knowledge and Data Engineering.

[13]  Rainer Lienhart,et al.  Empirical Analysis of Detection Cascades of Boosted Classifiers for Rapid Object Detection , 2003, DAGM-Symposium.

[14]  Alexander Vezhnevets,et al.  Avoiding Boosting Overfitting by Removing Confusing Samples , 2007, ECML.