Object Detection with Discriminatively Trained Part Based Models

We describe an object detection system based on mixtures of multiscale deformable part models. Our system is able to represent highly variable object classes and achieves state-of-the-art results in the PASCAL object detection challenges. While deformable part models have become quite popular, their value had not been demonstrated on difficult benchmarks such as the PASCAL data sets. Our system relies on new methods for discriminative training with partially labeled data. We combine a margin-sensitive approach for data-mining hard negative examples with a formalism we call latent SVM. A latent SVM is a reformulation of MI--SVM in terms of latent variables. A latent SVM is semiconvex, and the training problem becomes convex once latent information is specified for the positive examples. This leads to an iterative training algorithm that alternates between fixing latent values for positive examples and optimizing the latent SVM objective function.

[1]  Martin A. Fischler,et al.  The Representation and Matching of Pictorial Structures , 1973, IEEE Transactions on Computers.

[2]  Charles R. Dyer,et al.  An algorithm for constructing the aspect graph , 1986, 27th Annual Symposium on Foundations of Computer Science (sfcs 1986).

[3]  Ulf Grenander,et al.  Hands: A Pattern Theoretic Study of Biological Shapes , 1990 .

[4]  William T. Freeman,et al.  Orientation Histograms for Hand Gesture Recognition , 1995 .

[5]  Takeo Kanade,et al.  Human Face Detection in Visual Scenes , 1995, NIPS.

[6]  Yali Amit,et al.  Graphical Templates for Model Registration , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[7]  Tomaso A. Poggio,et al.  Example-Based Learning for View-Based Human Face Detection , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[8]  Pietro Perona,et al.  A Probabilistic Approach to Object Recognition Using Local Photometry and Global Geometry , 1998, ECCV.

[9]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .

[10]  Timothy F. Cootes,et al.  Active Appearance Models , 1998, ECCV.

[11]  Tomaso A. Poggio,et al.  A general framework for object detection , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[12]  B. Schölkopf,et al.  Advances in kernel methods: support vector learning , 1999 .

[13]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[14]  Pietro Perona,et al.  Unsupervised Learning of Models for Recognition , 2000, ECCV.

[15]  Takeo Kanade,et al.  A statistical method for 3D object detection applied to faces and cars , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[16]  Pietro Perona,et al.  Towards automatic discovery of object categories , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[17]  Daniel Snow,et al.  Efficient Deformable Template Detection and Localization without User Initialization , 2000, Comput. Vis. Image Underst..

[18]  David A. Forsyth,et al.  Mixtures of trees for object recognition , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[19]  Thomas Hofmann,et al.  Support Vector Machines for Multiple-Instance Learning , 2002, NIPS.

[20]  Peter Russer,et al.  Electromagnetic field representations and computations in complex structures I: complexity architecture and generalized network formulation , 2002 .

[21]  Pietro Perona,et al.  Object class recognition by unsupervised scale-invariant learning , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[22]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[23]  R. Sukthankar,et al.  PCA-SIFT: a more distinctive representation for local image descriptors , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[24]  Paul A. Viola,et al.  Robust Real-Time Face Detection , 2001, International Journal of Computer Vision.

[25]  Antonio Torralba,et al.  Contextual Priming for Object Detection , 2003, International Journal of Computer Vision.

[26]  Daniel P. Huttenlocher,et al.  Pictorial Structures for Object Recognition , 2004, International Journal of Computer Vision.

[27]  Alan L. Yuille,et al.  Feature extraction from faces using deformable templates , 2004, International Journal of Computer Vision.

[28]  Yali Amit,et al.  Part-based statistical models for object classification and detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[29]  Jitendra Malik,et al.  Shape matching and object recognition using low distortion correspondences , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[30]  Pietro Perona,et al.  A sparse object category model for efficient learning and exhaustive recognition , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[31]  Pietro Perona,et al.  A discriminative framework for modelling object classes , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[32]  Paul A. Viola,et al.  Multiple Instance Boosting for Object Detection , 2005, NIPS.

[33]  Daniel P. Huttenlocher,et al.  Spatial priors for part-based recognition using statistical models , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[34]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[35]  Daphna Weinshall,et al.  Efficient Learning of Relational Object Class Models , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[36]  Jianguo Zhang,et al.  The PASCAL Visual Object Classes Challenge , 2006 .

[37]  Stuart Geman,et al.  Context and Hierarchy in a Probabilistic Image Model , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[38]  Pietro Perona,et al.  A Sparse Object Category Model for Efficient Learning and Complete Recognition , 2006, Toward Category-Level Object Recognition.

[39]  Alexei A. Efros,et al.  Putting Objects in Perspective , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[40]  Cristian Sminchisescu,et al.  Training Deformable Models for Localization , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[41]  Luc Van Gool,et al.  The 2005 PASCAL Visual Object Classes Challenge , 2005, MLCW.

[42]  Fu Jie Huang,et al.  A Tutorial on Energy-Based Learning , 2006 .

[43]  Cordelia Schmid,et al.  Local Features and Kernels for Classification of Texture and Object Categories: A Comprehensive Study , 2006, 2006 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'06).

[44]  Yali Amit,et al.  POP: Patchwork of Parts Models for Object Recognition , 2007, International Journal of Computer Vision.

[45]  Bernt Schiele,et al.  Robust Object Detection with Interleaved Categorization and Segmentation , 2008, International Journal of Computer Vision.

[46]  David A. McAllester,et al.  The Generalized A* Architecture , 2007, J. Artif. Intell. Res..

[47]  Trevor Darrell,et al.  Hidden Conditional Random Fields , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[48]  Andrea Vedaldi,et al.  Objects in Context , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[49]  David A. McAllester,et al.  A discriminatively trained, multiscale, deformable part model , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[50]  Yoram Singer,et al.  Pegasos: primal estimated sub-gradient solver for SVM , 2011, Math. Program..