3D Pose-by-Detection of Vehicles via Discriminatively Reduced Ensembles of Correlation Filters

Estimating the precise pose of a 3D model in an image is challenging; explicitly identifying correspondences is difficult, particularly at smaller scales and in the presence of occlusion. Exemplar classifiers have demonstrated the potential of detection-based approaches to problems where precision is required. In particular, correlation filters explicitly suppress classifier response caused by slight shifts in the bounding box. This property makes them ideal exemplar classifiers for viewpoint discrimination, as small translational shifts can often be confounded with small rotational shifts. However, exemplar based pose-by-detection is not scalable because, as the desired precision of viewpoint estimation increases, the number of exemplars needed increases as well. We present a training framework to reduce an ensemble of exemplar correlation filters for viewpoint estimation by directly optimizing a discriminative objective. We show that the discriminatively reduced ensemble outperforms the state-of-the-art on three publicly available datasets and we introduce a new dataset for continuous car pose estimation in street scene images.

[1]  Rainer Lienhart,et al.  Synthetically trained multi-view object class and viewpoint detection for advanced image retrieval , 2011, ICMR '11.

[2]  Patrick J. Flynn,et al.  A Survey Of Free-Form Object Representation and Recognition Techniques , 2001, Comput. Vis. Image Underst..

[3]  Ce Liu,et al.  Depth Extraction from Video Using Non-parametric Sampling , 2012, ECCV.

[4]  James J. Little,et al.  Fine-Grained Categorization for 3D Scene Understanding , 2012, BMVC.

[5]  Hiroshi Murase,et al.  Visual learning and recognition of 3-d objects from appearance , 2005, International Journal of Computer Vision.

[6]  Daniel Cohen-Or,et al.  3-Sweep , 2013, ACM Trans. Graph..

[7]  René Vidal,et al.  Efficient Object Localization and Pose Estimation with 3D Wireframe Models , 2013, 2013 IEEE International Conference on Computer Vision Workshops.

[8]  Stanley M. Bileschi,et al.  Street Scenes: towards scene understanding in still images , 2006 .

[9]  Cordelia Schmid,et al.  Viewpoint-independent object class detection using 3D Feature Maps , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Deva Ramanan,et al.  Analyzing 3D Objects in Cluttered Images , 2012, NIPS.

[11]  Philip David,et al.  SoftPOSIT: Simultaneous Pose and Correspondence Determination , 2002, International Journal of Computer Vision.

[12]  Bernhard Schölkopf,et al.  Comparison of View-Based Object Recognition Algorithms Using Realistic 3D Models , 1996, ICANN.

[13]  Ahmed M. Elgammal,et al.  Homeomorphic Manifold Analysis (HMA): Generalized separation of style and content on manifolds , 2013, Image Vis. Comput..

[14]  Sven J. Dickinson,et al.  3D Object Detection and Viewpoint Estimation with a Deformable 3D Cuboid Model , 2012, NIPS.

[15]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[16]  Yi Deng,et al.  A symmetric patch-based correspondence model for occlusion handling , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[17]  Alexei A. Efros,et al.  Ensemble of exemplar-SVMs for object detection and beyond , 2011, 2011 International Conference on Computer Vision.

[18]  Silvio Savarese,et al.  3D generic object categorization, localization and pose estimation , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[19]  Peter V. Gehler,et al.  3D2PM - 3D Deformable Part Models , 2012, ECCV.

[20]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  P. Fua,et al.  Pose estimation for category specific multiview object localization , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  V. Lepetit,et al.  EPnP: An Accurate O(n) Solution to the PnP Problem , 2009, International Journal of Computer Vision.

[23]  Ronen Basri,et al.  Viewpoint-aware object detection and continuous pose estimation , 2012, Image Vis. Comput..

[24]  Bernt Schiele,et al.  Detailed 3D Representations for Object Recognition and Modeling , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Michael Goesele,et al.  Back to the Future: Learning Shape Models from 3D CAD Data , 2010, BMVC.

[26]  Cordelia Schmid,et al.  Multi-view object class detection with a 3D geometric model , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[27]  Ahmed M. Elgammal,et al.  Joint Object and Pose Recognition Using Homeomorphic Manifold Analysis , 2013, AAAI.

[28]  Silvio Savarese,et al.  Learning a dense multi-view representation for detection, viewpoint classification and synthesis of object categories , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[29]  Takeo Kanade,et al.  Correlation Filters for Object Alignment , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[30]  Vishnu Naresh Boddeti,et al.  Advances in correlation filters: vector features, structured prediction and shape alignment , 2012 .

[31]  Ronen Basri,et al.  Constructing implicit 3D shape models for pose estimation , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[32]  Hao Su,et al.  Object Bank: A High-Level Image Representation for Scene Classification & Semantic Feature Sparsification , 2010, NIPS.

[33]  Martial Hebert,et al.  Data-Driven Scene Understanding from 3D Models , 2012, BMVC.