Pruning training sets for learning of object categories

Training datasets for learning of object categories are often contaminated or imperfect. We explore an approach to automatically identify examples that are noisy or troublesome for learning and exclude them from the training set. The problem is relevant to learning in semi-supervised or unsupervised setting, as well as to learning when the training data is contaminated with wrongly labeled examples or when correctly labeled, but hard to learn examples, are present. We propose a fully automatic mechanism for noise cleaning, called 'data pruning' and demonstrate its success on learning of human faces. It is not assumed that the data or the noise can be modeled or that additional training examples are available. Our experiments show that data pruning can improve on generalization performance for algorithms with various robustness to noise. It outperforms methods with regularization properties and is superior to commonly applied aggregation methods, such as bagging.

[1]  David G. Stork,et al.  Pattern Classification , 1973 .

[2]  S. Weisberg Applied Linear Regression , 1981 .

[3]  T. Hassard,et al.  Applied Linear Regression , 2005 .

[4]  Robert C. Bolles,et al.  Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[5]  Ming Li,et al.  Learning in the presence of malicious errors , 1993, STOC '88.

[6]  Dana Angluin,et al.  Learning from noisy examples , 1988, Machine Learning.

[7]  Michael I. Jordan,et al.  Advances in Neural Information Processing Systems 30 , 1995 .

[8]  Isabelle Guyon,et al.  Computer aided cleaning of large databases for character recognition , 1992, Proceedings., 11th IAPR International Conference on Pattern Recognition. Vol.II. Conference B: Pattern Recognition Methodology and Systems.

[9]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[10]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[11]  Petri Koistinen Asymptotic Theory for Regularization: One-Dimensional Linear Case , 1997, NIPS.

[12]  Gunnar Rätsch,et al.  Regularizing AdaBoost , 1998, NIPS.

[13]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[14]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[15]  Yuansong Liao,et al.  Constructing Heterogeneous Committees Using Input Feature Grouping: Application to Economic Forecasting , 1999, NIPS.

[16]  Malik Magdon-Ismail,et al.  No Free Lunch for Noise Prediction , 2000, Neural Computation.

[17]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[18]  Paul A. Viola,et al.  Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[19]  Y. Abu-Mostafa,et al.  Generalization error estimates and training data valuation , 2002 .

[20]  Antonio Torralba,et al.  Using the Forest to See the Trees: A Graphical Model Relating Features, Objects, and Scenes , 2003, NIPS.

[21]  Cesare Furlanello,et al.  Bias-Variance Control Via Hard Points Shaving , 2004, Int. J. Pattern Recognit. Artif. Intell..

[22]  Pietro Perona,et al.  A Visual Category Filter for Google Images , 2004, ECCV.

[23]  Lexing Xie,et al.  Slightly Supervised Learning of Part-Based Appearance Models , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[24]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.