Discriminative structure learning of hierarchical representations for object detection

A variety of flexible models have been proposed to detect objects in challenging real world scenes. Motivated by some of the most successful techniques, we propose a hierarchical multi-feature representation and automatically learn flexible hierarchical object models for a wide variety of object classes. To that end we not only rely on automatic selection of relevant individual features, but go beyond previous work by automatically selecting and modeling complex, long-range feature couplings within this model. To achieve this generality and flexibility our work combines structure learning in conditional random fields and discriminative parameter learning of classifiers using hierarchical features. We adopt an efficient gradient based heuristic for model selection and carry it forward to discriminative, multidimensional selection of features and their couplings for improved detection performance. Experimentally we consistently outperform the currently leading method on all 20 classes of the PASCAL VOC 2007 challenge and achieve the best published results on 16 of 20 classes.

[1]  Chi-Hoon Lee,et al.  Efficient Spatial Classification Using Decoupled Conditional Random Fields , 2006, PKDD.

[2]  Derek Hoiem,et al.  3D LayoutCRF for Multi-View Object Class Recognition and Segmentation , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[4]  Jamie Shotton,et al.  The Layout Consistent Random Field for Recognizing and Segmenting Partially Occluded Objects , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[5]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .

[6]  Trevor Darrell,et al.  The Pyramid Match Kernel: Efficient Learning with Sets of Features , 2007, J. Mach. Learn. Res..

[7]  Mark W. Schmidt,et al.  Structure learning in random fields for heart motion abnormality detection , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Manik Varma,et al.  Learning The Discriminative Power-Invariance Trade-Off , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[9]  James Theiler,et al.  Grafting: Fast, Incremental Feature Selection by Gradient Descent in Function Space , 2003, J. Mach. Learn. Res..

[10]  Bernt Schiele,et al.  Hierarchical Support Vector Random Fields: Joint Training to Combine Local and Global Features , 2008, ECCV.

[11]  Antonio Criminisi,et al.  TextonBoost: Joint Appearance, Shape and Context Modeling for Multi-class Object Recognition and Segmentation , 2006, ECCV.

[12]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[13]  Ashish Kapoor,et al.  Located Hidden Random Fields: Learning Discriminative Parts for Object Detection , 2006, ECCV.

[14]  Cordelia Schmid,et al.  Local Features and Kernels for Classification of Texture and Object Categories: A Comprehensive Study , 2006, 2006 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'06).

[15]  David A. Forsyth,et al.  Configuration Estimates Improve Pedestrian Finding , 2007, NIPS.

[16]  Subhransu Maji,et al.  Classification using intersection kernel support vector machines is efficient , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  Martial Hebert,et al.  Discriminative Random Fields , 2006, International Journal of Computer Vision.

[18]  William T. Freeman,et al.  Understanding belief propagation and its generalizations , 2003 .

[19]  David G. Lowe,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004, International Journal of Computer Vision.

[20]  Daphne Koller,et al.  Efficient Structure Learning of Markov Networks using L1-Regularization , 2006, NIPS.

[21]  Pietro Perona,et al.  Object class recognition by unsupervised scale-invariant learning , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[22]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[23]  Anat Levin,et al.  Learning to Combine Bottom-Up and Top-Down Segmentation , 2006, ECCV.

[24]  David A. McAllester,et al.  A discriminatively trained, multiscale, deformable part model , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.