Discriminatively Trained And-Or Tree Models for Object Detection

This paper presents a method of learning reconfigurable And-Or Tree (AOT) models discriminatively from weakly annotated data for object detection. To explore the appearance and geometry space of latent structures effectively, we first quantize the image lattice using an over complete set of shape primitives, and then organize them into a directed a cyclic And-Or Graph (AOG) by exploiting their compositional relations. We allow overlaps between child nodes when combining them into a parent node, which is equivalent to introducing an appearance Or-node implicitly for the overlapped portion. The learning of an AOT model consists of three components: (i) Unsupervised sub-category learning (i.e., branches of an object Or-node) with the latent structures in AOG being integrated out. (ii) Weakly supervised part configuration learning (i.e., seeking the globally optimal parse trees in AOG for each sub-category). To search the globally optimal parse tree in AOG efficiently, we propose a dynamic programming (DP) algorithm. (iii) Joint appearance and structural parameters training under latent structural SVM framework. In experiments, our method is tested on PASCAL VOC 2007 and 2010 detection benchmarks of 20 object classes and outperforms comparable state-of-the-art methods.

[1]  Yunde Jia,et al.  Learning Global and Reconfigurable Part-Based Models for Object Detection , 2012, 2012 IEEE International Conference on Multimedia and Expo.

[2]  Subhransu Maji,et al.  Detecting People Using Mutually Consistent Poselet Activations , 2010, ECCV.

[3]  Fahad Shahbaz Khan,et al.  Color attributes for object detection , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Deva Ramanan,et al.  Steerable part models , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Stan Sclaroff,et al.  Contextual Object Detection Using Set-Based Classification , 2012, ECCV.

[6]  Tieniu Tan,et al.  Boosted local structured HOG-LBP for object localization , 2011, CVPR 2011.

[7]  Luc Van Gool,et al.  The 2005 PASCAL Visual Object Classes Challenge , 2005, MLCW.

[8]  Song-Chun Zhu,et al.  Learning AND-OR Templates for Object Recognition and Detection , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Stefan Carlsson,et al.  Mixture Component Identification and Learning for Visual Recognition , 2012, ECCV.

[10]  Qiang Chen,et al.  Contextualizing Object Detection and Classification , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Jianguo Zhang,et al.  The PASCAL Visual Object Classes Challenge , 2006 .

[12]  Jitendra Malik,et al.  Multi-component Models for Object Detection , 2012, ECCV.

[13]  Song-Chun Zhu,et al.  Hierarchical Space Tiling for Scene Modeling , 2012, ACCV.

[14]  William T. Freeman,et al.  Latent hierarchical structural learning for object detection , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[15]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Thorsten Joachims,et al.  Learning structural SVMs with latent variables , 2009, ICML '09.

[17]  Trevor Darrell,et al.  Sparselet Models for Efficient Multiclass Object Detection , 2012, ECCV.

[18]  Charless C. Fowlkes,et al.  Discriminative Models for Multi-Class Object Layout , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[19]  Ivan Laptev,et al.  Object Detection Using Strongly-Supervised Deformable Part Models , 2012, ECCV.

[20]  Peter V. Gehler,et al.  Teaching 3D geometry to deformable part models , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  Andrew Zisserman,et al.  Multiple kernels for object detection , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[22]  Delbert Dueck,et al.  Clustering by Passing Messages Between Data Points , 2007, Science.

[23]  Jun Zhu,et al.  Learning reconfigurable scene representation by tangram model , 2012, 2012 IEEE Workshop on the Applications of Computer Vision (WACV).

[24]  Roozbeh Mottaghi,et al.  Augmenting deformable part models with irregular-shaped object patches , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[25]  Pietro Perona,et al.  Strong supervision from weak annotation: Interactive training of deformable part models , 2011, 2011 International Conference on Computer Vision.