Object Detection with Grammar Models

Compositional models provide an elegant formalism for representing the visual appearance of highly variable objects. While such models are appealing from a theoretical point of view, it has been difficult to demonstrate that they lead to performance advantages on challenging datasets. Here we develop a grammar model for person detection and show that it outperforms previous high-performance systems on the PASCAL benchmark. Our model represents people using a hierarchy of deformable parts, variable structure and an explicit model of occlusion for partially visible objects. To train the model, we introduce a new discriminative framework for learning structured prediction models from weakly-labeled data.

[1]  William T. Freeman,et al.  Orientation Histograms for Hand Gesture Recognition , 1995 .

[2]  Kazuo Kyuma,et al.  Computer vision for computer games , 1996, Proceedings of the Second International Conference on Automatic Face and Gesture Recognition.

[3]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .

[4]  Dariu Gavrila,et al.  The Visual Analysis of Human Movement: A Survey , 1999, Comput. Vis. Image Underst..

[5]  Dariu Gavrila,et al.  Real-time object detection for "smart" vehicles , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[6]  Daniel P. Huttenlocher,et al.  Efficient matching of pictorial structures , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[7]  Jitendra Malik,et al.  Matching Shapes , 2001, ICCV.

[8]  Tomaso A. Poggio,et al.  Example-Based Object Detection in Images by Components , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[9]  Cordelia Schmid,et al.  Learning to Parse Pictures of People , 2002, ECCV.

[10]  Luc Van Gool,et al.  Efficient pedestrian detection : a test case for SVM based categorization , 2002 .

[11]  Ben Taskar,et al.  Max-Margin Markov Networks , 2003, NIPS.

[12]  Alan L. Yuille,et al.  The Concave-Convex Procedure , 2003, Neural Computation.

[13]  David A. Forsyth,et al.  Probabilistic Methods for Finding People , 2001, International Journal of Computer Vision.

[14]  Takeo Kanade,et al.  Object Detection Using the Statistics of Parts , 2004, International Journal of Computer Vision.

[15]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[16]  R. Sukthankar,et al.  PCA-SIFT: a more distinctive representation for local image descriptors , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[17]  D.M. Gavrila,et al.  Vision-based pedestrian detection: the PROTECTOR system , 2004, IEEE Intelligent Vehicles Symposium, 2004.

[18]  Cordelia Schmid,et al.  Scale & Affine Invariant Interest Point Detectors , 2004, International Journal of Computer Vision.

[19]  Tomaso A. Poggio,et al.  A Trainable System for Object Detection , 2000, International Journal of Computer Vision.

[20]  Cordelia Schmid,et al.  Human Detection Based on a Probabilistic Assembly of Robust Part Detectors , 2004, ECCV.

[21]  Paul A. Viola,et al.  Detecting Pedestrians Using Patterns of Motion and Appearance , 2005, International Journal of Computer Vision.

[22]  Cordelia Schmid,et al.  A performance evaluation of local descriptors , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  E. L. Schwartz,et al.  Spatial mapping in the primate sensory projection: Analytic structure and relevance to perception , 1977, Biological Cybernetics.

[24]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[25]  Thomas Hofmann,et al.  Large Margin Methods for Structured and Interdependent Output Variables , 2005, J. Mach. Learn. Res..

[26]  Jianguo Zhang,et al.  The PASCAL Visual Object Classes Challenge , 2006 .

[27]  Stuart Geman,et al.  Context and Hierarchy in a Probabilistic Image Model , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[28]  Jason Weston,et al.  Trading convexity for scalability , 2006, ICML.

[29]  Alexander J. Smola,et al.  Tighter Bounds for Structured Estimation , 2008, NIPS.

[30]  Christoph H. Lampert,et al.  Learning to Localize Objects with Structured Output Regression , 2008, ECCV.

[31]  Thorsten Joachims,et al.  Learning structural SVMs with latent variables , 2009, ICML '09.

[32]  Long Zhu,et al.  Unsupervised Learning of Probabilistic Grammar-Markov Models for Object Categories , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[33]  Shuicheng Yan,et al.  An HOG-LBP human detector with partial occlusion handling , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[34]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[35]  William T. Freeman,et al.  Latent hierarchical structural learning for object detection , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[36]  Matthew B. Blaschko,et al.  Simultaneous Object Detection and Ranking with Weak Supervision , 2010, NIPS.

[37]  Dariu Gavrila,et al.  Multi-cue pedestrian classification with partial occlusion handling , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[38]  Antonio Torralba,et al.  Part and appearance sharing: Recursive Compositional Models for multi-view , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[39]  Subhransu Maji,et al.  Detecting People Using Mutually Consistent Poselet Activations , 2010, ECCV.

[40]  David A. McAllester,et al.  Generalization bounds and consistency for latent-structural probit and ramp loss , 2011, MLSLP.

[41]  Pedro F. Felzenszwalb Object detection grammars , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).