Monocular Object Orientation Estimation using Riemannian Regression and Classification Networks

We consider the task of estimating the 3D orientation of an object of known category given an image of the object and a bounding box around it. Recently, CNN-based regression and classification methods have shown significant performance improvements for this task. This paper proposes a new CNN-based approach to monocular orientation estimation that advances the state of the art in four different directions. First, we take into account the Riemannian structure of the orientation space when designing regression losses and nonlinear activation functions. Second, we propose a mixed Riemannian regression and classification framework that better handles the challenging case of nearly symmetric objects. Third, we propose a data augmentation strategy that is specifically designed to capture changes in 3D orientation. Fourth, our approach leads to state-of-the-art results on the PASCAL3D+ dataset.

[1]  Luc Van Gool,et al.  The Pascal Visual Object Classes Challenge: A Retrospective , 2014, International Journal of Computer Vision.

[2]  Roberto Cipolla,et al.  Geometric Loss Functions for Camera Pose Regression with Deep Learning , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Silvio Savarese,et al.  View Synthesis for Recognizing Unseen Poses of Object Classes , 2008, ECCV.

[4]  Antonio Torralba,et al.  FPM: Fine Pose Parts-Based Model with 3D CAD Models , 2014, ECCV.

[5]  Vincent Lepetit,et al.  BB8: A Scalable, Accurate, Robust to Partial Occlusion Method for Predicting the 3D Poses of Challenging Objects without Using Depth , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[6]  Mathieu Aubry,et al.  Crafting a multi-task CNN for viewpoint estimation , 2016, BMVC.

[7]  Jian Sun,et al.  Identity Mappings in Deep Residual Networks , 2016, ECCV.

[8]  Deva Ramanan,et al.  Analyzing 3D Objects in Cluttered Images , 2012, NIPS.

[9]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Deva Ramanan,et al.  Analysis by Synthesis: 3D Object Recognition by Object Reconstruction , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Cordelia Schmid,et al.  Multi-view object class detection with a 3D geometric model , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[12]  Iasonas Kokkinos,et al.  DenseReg: Fully Convolutional Dense Shape Regression In-the-Wild , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Andrew Zisserman,et al.  Return of the Devil in the Details: Delving Deep into Convolutional Nets , 2014, BMVC.

[14]  Yuandong Tian,et al.  Single Image 3D Interpreter Network , 2016, ECCV.

[15]  Roberto Cipolla,et al.  PoseNet: A Convolutional Network for Real-Time 6-DOF Camera Relocalization , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[16]  Jitendra Malik,et al.  Factoring Shape, Pose, and Layout from the 2D Image of a 3D Scene , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[17]  Robert A. Jacobs,et al.  Hierarchical Mixtures of Experts and the EM Algorithm , 1993, Neural Computation.

[18]  Gregory D. Hager,et al.  A Unified Framework for Multi-View Multi-Class Object Pose Estimation , 2018, ECCV.

[19]  Roberto Cipolla,et al.  Modelling uncertainty in deep learning for camera relocalization , 2015, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[20]  Vincent Lepetit,et al.  A Novel Representation of Parts for Accurate 3D Object Detection and Tracking in Monocular Images , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[21]  Bernhard P. Wrobel,et al.  Multiple View Geometry in Computer Vision , 2001 .

[22]  Bishesh Khanal,et al.  Computing CNN Loss and Gradients for Pose Estimation with Riemannian Geometry , 2018, MICCAI.

[23]  Wei Liang,et al.  Viewpoint Estimation for Objects with Convolutional Neural Network Trained on Synthetic Images , 2016, PCM.

[24]  Vincent Lepetit,et al.  3D Pose Estimation and 3D Model Retrieval for Objects in the Wild , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[25]  Ahmed M. Elgammal,et al.  A Comparative Analysis and Study of Multiview CNN Models for Joint Object Categorization and Pose Estimation , 2016, ICML.

[26]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Ronen Basri,et al.  Viewpoint-aware object detection and pose estimation , 2011, 2011 International Conference on Computer Vision.