Semi-supervised training of models for appearance-based statistical object detection methods

Appearance-based object detection systems using statistical models have proven quite successful. They can reliably detect textured, rigid objects in a variety of poses, lighting conditions and scales. However, the construction of these systems is time-consuming and difficult because a large number of training examples must be collected and manually labeled in order to capture variations in object appearance. Typically, this requires indicating which regions of the image correspond to the object to be detected, and which belong to background clutter, as well as marking key landmark locations on the object. The goal of this work is to pursue and evaluate approaches which reduce the amount of fully labeled examples needed, by training these models in a semi-supervised manner. To this end, we develop approaches based on Expectation-Maximization and self-training that utilize a small number of fully labeled training examples in combination with a set of “weakly labeled” examples. This is advantageous in that weakly labeled data are inherently less costly to generate, since the label information is specified in an uncertain or incomplete fashion. For example, a weakly labeled image might be labeled as containing the training object, with the object location and scale left unspecified. In this work we analyze the performance of the techniques developed through a comprehensive empirical investigation. We find that supplementing a small fully labeled training set with weakly labeled data in the training process reliably improves detector performance for a variety of detection approaches. The outcome is the identification of successful approaches and key issues that are central to achieving good performance in the semi-supervised training of object detection systems.

[1]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[2]  Thomas M. Cover,et al.  Estimation by the nearest neighbor rule , 1968, IEEE Trans. Inf. Theory.

[3]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[4]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[5]  李幼升,et al.  Ph , 1989 .

[6]  Vijay V. Raghavan,et al.  A critical investigation of recall and precision as measures of retrieval system performance , 1989, TOIS.

[7]  Yann LeCun,et al.  Multi-Digit Recognition Using a Space Displacement Neural Network , 1991, NIPS.

[8]  Dean Pomerleau,et al.  Efficient Training of Artificial Neural Networks for Autonomous Navigation , 1991, Neural Computation.

[9]  Edward H. Adelson,et al.  The Design and Use of Steerable Filters , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[10]  Michael I. Jordan,et al.  Supervised learning from incomplete data via an EM approach , 1993, NIPS.

[11]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[12]  T. Cover,et al.  The relative value of labeled and unlabeled samples in pattern recognition , 1993, Proceedings. IEEE International Symposium on Information Theory.

[13]  R. Vaillant,et al.  Original approach for the localisation of objects in images , 1994 .

[14]  Michael I. Jordan,et al.  Learning from Incomplete Data , 1994 .

[15]  P. Perona,et al.  Face Localization via Shape Statistics , 1995 .

[16]  Santosh S. Venkatesh,et al.  Learning from a mixture of labeled and unlabeled examples with parametric side information , 1995, COLT '95.

[17]  Vittorio Castelli,et al.  On the exponential value of labeled samples , 1995, Pattern Recognit. Lett..

[18]  Pietro Perona,et al.  Recognition of planar object classes , 1996, Proceedings CVPR IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[19]  Michael C. Burl Recognition of visual object classes , 1996 .

[20]  T. K. Leungz,et al.  Recognition of Visual Object Classes , 1996 .

[21]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[22]  David J. Miller,et al.  A Mixture of Experts Classifier with Learning Based on Both Labelled and Unlabelled Data , 1996, NIPS.

[23]  Paul A. Viola,et al.  Structure Driven Image Database Retrieval , 1997, NIPS.

[24]  Thomas G. Dietterich,et al.  Solving the Multiple Instance Problem with Axis-Parallel Rectangles , 1997, Artif. Intell..

[25]  Tomás Lozano-Pérez,et al.  A Framework for Multiple-Instance Learning , 1997, NIPS.

[26]  Alex Pentland,et al.  Probabilistic Visual Learning for Object Representation , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[27]  Neural Network-Based Face Detection , 1998 .

[28]  Pietro Perona,et al.  A Probabilistic Approach to Object Recognition Using Local Photometry and Global Geometry , 1998, ECCV.

[29]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[30]  Oded Maron,et al.  Learning from Ambiguity , 1998 .

[31]  Takeo Kanade,et al.  Neural Network-Based Face Detection , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[32]  Yoshua Bengio,et al.  Convolutional networks for images, speech, and time series , 1998 .

[33]  Takeo Kanade,et al.  Rotation Invariant Neural Network-Based Face Detection , 1998, Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No.98CB36231).

[34]  Sebastian Thrun,et al.  Learning to Classify Text from Labeled and Unlabeled Documents , 1998, AAAI/IAAI.

[35]  Oded Maron,et al.  Multiple-Instance Learning for Natural Scene Classification , 1998, ICML.

[36]  Takeo Kanade,et al.  Probabilistic modeling of local appearance and spatial relationships for object recognition , 1998, Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No.98CB36231).

[37]  Shumeet Baluja,et al.  Probabilistic Modeling for Face Orientation Discrimination: Learning from Labeled and Unlabeled Data , 1998, NIPS.

[38]  Paul A. Viola,et al.  A cluster-based statistical model for object detection , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[39]  Andrew McCallum,et al.  Text Classification by Bootstrapping with Keywords, EM and Shrinkage , 1999 .

[40]  Ellen Riloff Bootstrapping for text learning tasks , 1999 .

[41]  Thorsten Joachims,et al.  Transductive Inference for Text Classification using Support Vector Machines , 1999, ICML.

[42]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[43]  Pietro Perona,et al.  Unsupervised Learning of Models for Recognition , 2000, ECCV.

[44]  Takeo Kanade,et al.  A statistical method for 3D object detection applied to faces and cars , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[45]  Takeo Kanade,et al.  A statistical approach to 3d object detection applied to faces and cars , 2000 .

[46]  Pietro Perona,et al.  Towards automatic discovery of object categories , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[47]  Paul A. Viola,et al.  Learning from one example through shared densities on transforms , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[48]  Rayid Ghani,et al.  Analyzing the effectiveness and applicability of co-training , 2000, CIKM '00.

[49]  Andrew McCallum,et al.  Efficient clustering of high-dimensional data sets with application to reference matching , 2000, KDD '00.

[50]  Charles R. Rosenberg Image Color Constancy Using EM and Cached Statistics , 2000, ICML.

[51]  Paul A. Viola,et al.  Boosting Image Retrieval , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[52]  Self-supervised learning for object recognition based on kernel discriminant-EM algorithm , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[53]  Avrim Blum,et al.  Learning from Labeled and Unlabeled Data using Graph Mincuts , 2001, ICML.

[54]  C. H. Li,et al.  Constrained minimum cut for classification using labeled and unlabeled data , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[55]  Paul A. Viola,et al.  Robust Real-time Object Detection , 2001 .

[56]  David A. Forsyth,et al.  Learning the semantics of words and pictures , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[57]  Tom M. Mitchell,et al.  Using unlabeled data to improve text classification , 2001 .

[58]  Adrian Corduneanu,et al.  Stable Mixing of Complete and Incomplete Information , 2014 .

[59]  Tommi S. Jaakkola,et al.  Partially labeled classification with Markov random walks , 2001, NIPS.

[60]  Andrea Salgian,et al.  Minimally supervised acquisition of 3D recognition models from cluttered images , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[61]  Cordelia Schmid,et al.  Constructing models for content-based image retrieval , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[62]  Antonio Torralba,et al.  Statistical Context Priming for Object Detection , 2001, ICCV.

[63]  Qi Zhang,et al.  EM-DD: An Improved Multiple-Instance Learning Technique , 2001, NIPS.

[64]  Philip S. Yu,et al.  Partially Supervised Classification of Text Documents , 2002, ICML.

[65]  Thomas Hofmann,et al.  Support Vector Machines for Multiple-Instance Learning , 2002, NIPS.

[66]  Geoffrey E. Hinton,et al.  Self Supervised Boosting , 2002, NIPS.

[67]  R. Jones,et al.  A Comparison Of Efficacy And Assumptions Of Bootstrapping Algorithms For Training Information Extraction Systems , 2002 .

[68]  Qi Zhang,et al.  Content-Based Image Retrieval Using Multiple-Instance Learning , 2002, ICML.

[69]  Martial Hebert,et al.  Training Object Detection Models with Weakly Labeled Data , 2002, BMVC.

[70]  Tommi S. Jaakkola,et al.  Information Regularization with Partially Labeled Data , 2002, NIPS.

[71]  Anton Schwaighofer,et al.  Transductive and Inductive Methods for Approximate Gaussian Process Regression , 2002, NIPS.

[72]  Arindam Banerjee,et al.  Semi-supervised Clustering by Seeding , 2002, ICML.

[73]  Zoubin Ghahramani,et al.  Combining active learning and semi-supervised learning using Gaussian fields and harmonic functions , 2003, ICML 2003.

[74]  Shivani Agarwal,et al.  An Experimental Study of EM-Based Algorithms for Semi-Supervised Learning in Audio Classification , 2003 .

[75]  Rachid Deriche,et al.  Active unsupervised texture segmentation on a diffusion based feature space , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[76]  James M. Rehg,et al.  Learning a Rare Event Detection Cascade by Direct Feature Selection , 2003, NIPS.

[77]  Antonio Torralba,et al.  Context-based vision system for place and object recognition , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[78]  Thorsten Joachims,et al.  Transductive Learning via Spectral Graph Partitioning , 2003, ICML.

[79]  Christophe Chefd'Hotel,et al.  Practical non-parametric density estimation on a transformation group for vision , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[80]  Paul A. Viola,et al.  Unsupervised improvement of visual detectors using cotraining , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[81]  Pietro Perona,et al.  Object class recognition by unsupervised scale-invariant learning , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[82]  David A. Forsyth,et al.  Matching Words and Pictures , 2003, J. Mach. Learn. Res..

[83]  Adrian Corduneanu,et al.  On Information Regularization , 2002, UAI.

[84]  Chris Stauffer Minimally-supervised classification using multiple observation sets , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[85]  J. Lafferty,et al.  Combining active learning and semi-supervised learning using Gaussian fields and harmonic functions , 2003, ICML 2003.

[86]  Tom Minka,et al.  Bayesian Color Constancy with Non-Gaussian Models , 2003, NIPS.

[87]  Bir Bhanu,et al.  A new semi-supervised EM algorithm for image retrieval , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[88]  Pietro Perona,et al.  A Bayesian approach to unsupervised one-shot learning of object categories , 2003, ICCV 2003.

[89]  Fabio Gagliardi Cozman,et al.  Semi-Supervised Learning of Mixture Models and Bayesian Networks , 2003 .

[90]  C. Ling,et al.  AUC: a Statistically Consistent and more Discriminating Measure than Accuracy , 2003, IJCAI.

[91]  Pietro Perona,et al.  A Bayesian approach to unsupervised one-shot learning of object categories , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[92]  Nicu Sebe,et al.  Learning Bayesian network classifiers for facial expression recognition both labeled and unlabeled data , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[93]  Michael C. Mozer,et al.  Optimizing Classifier Performance via an Approximation to the Wilcoxon-Mann-Whitney Statistic , 2003, ICML.

[94]  Henry Schneiderman,et al.  Learning Statistical Structure for Object Detection , 2003, CAIP.

[95]  Henry Schneiderman,et al.  Learning a restricted Bayesian network for object detection , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[96]  Takeo Kanade,et al.  Object Detection Using the Statistics of Parts , 2004, International Journal of Computer Vision.

[97]  Henry Schneiderman,et al.  Feature-centric evaluation for efficient cascaded object detection , 2004, CVPR 2004.

[98]  Paul A. Viola,et al.  Robust Real-Time Face Detection , 2001, International Journal of Computer Vision.

[99]  Adam Tauman Kalai,et al.  A Note on Learning from Multiple-Instance Examples , 2004, Machine Learning.

[100]  Bernt Schiele,et al.  Recognition without Correspondence using Multidimensional Receptive Field Histograms , 2004, International Journal of Computer Vision.

[101]  Antonio Torralba,et al.  Contextual Priming for Object Detection , 2003, International Journal of Computer Vision.

[102]  Sebastian Thrun,et al.  Text Classification from Labeled and Unlabeled Documents using EM , 2000, Machine Learning.