3D Pose Regression Using Convolutional Neural Networks

3D pose estimation is a key component of many important computer vision tasks like autonomous navigation and robot manipulation. Current state-of-the-art approaches for 3D object pose estimation, like Viewpoints & Keypoints and Render for CNN, solve this problem by discretizing the pose space into bins and solving a pose-classification task. We argue that 3D pose is continuous and can be solved in a regression framework if done with the right representation, data augmentation and loss function. We modify a standard VGG network for the task of 3D pose regression and show competitive performance compared to state-of-the-art.

[1]  Leonidas J. Guibas,et al.  Render for CNN: Viewpoint Estimation in Images Using CNNs Trained with Rendered 3D Model Views , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[2]  Roberto Cipolla,et al.  PoseNet: A Convolutional Network for Real-Time 6-DOF Camera Relocalization , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[3]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  P. Beckman,et al.  Accelerating scientific discovery : 2007 annual report. , 2008 .

[5]  Roberto Cipolla,et al.  Modelling uncertainty in deep learning for camera relocalization , 2015, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[6]  Xiaowei Zhou,et al.  6-DoF object pose from semantic keypoints , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[7]  Silvio Savarese,et al.  Beyond PASCAL: A benchmark for 3D object detection in the wild , 2014, IEEE Winter Conference on Applications of Computer Vision.

[8]  Deva Ramanan,et al.  Analysis by Synthesis: 3D Object Recognition by Object Reconstruction , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Andrew Zisserman,et al.  Return of the Devil in the Details: Delving Deep into Convolutional Nets , 2014, BMVC.

[10]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[11]  Yuandong Tian,et al.  Single Image 3D Interpreter Network , 2016, ECCV.

[12]  Ahmed M. Elgammal,et al.  Digging Deep into the Layers of CNNs: In Search of How CNNs Achieve View Invariance , 2015, ICLR.

[13]  Nancy Wilkins-Diehr,et al.  XSEDE: Accelerating Scientific Discovery , 2014, Computing in Science & Engineering.

[14]  R. Vidal,et al.  Distributed 3-D localization in camera networks , 2009 .

[15]  Ahmed M. Elgammal,et al.  A Comparative Analysis and Study of Multiview CNN Models for Joint Object Categorization and Pose Estimation , 2016, ICML.

[16]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[18]  Roberto Cipolla,et al.  Geometric Loss Functions for Camera Pose Regression with Deep Learning , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Ralph Roskies,et al.  Bridges: a uniquely flexible HPC resource for new communities and data analytics , 2015, XSEDE.

[20]  Jitendra Malik,et al.  Viewpoints and keypoints , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Takeo Kanade,et al.  A statistical approach to 3d object detection applied to faces and cars , 2000 .

[22]  Peter V. Gehler,et al.  Teaching 3D geometry to deformable part models , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  S. Shankar Sastry,et al.  An Invitation to 3-D Vision: From Images to Geometric Models , 2003 .