论文信息 - 3D Pose Regression Using Convolutional Neural Networks

3D Pose Regression Using Convolutional Neural Networks

3D pose estimation is a key component of many important computer vision tasks like autonomous navigation and robot manipulation. Current state-of-the-art approaches for 3D object pose estimation, like Viewpoints & Keypoints and Render for CNN, solve this problem by discretizing the pose space into bins and solving a pose-classification task. We argue that 3D pose is continuous and can be solved in a regression framework if done with the right representation, data augmentation and loss function. We modify a standard VGG network for the task of 3D pose regression and show competitive performance compared to state-of-the-art.

[1] Leonidas J. Guibas,et al. Render for CNN: Viewpoint Estimation in Images Using CNNs Trained with Rendered 3D Model Views , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[2] Roberto Cipolla,et al. PoseNet: A Convolutional Network for Real-Time 6-DOF Camera Relocalization , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[3] Kaiming He,et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4] P. Beckman,et al. Accelerating scientific discovery : 2007 annual report. , 2008 .

[5] Roberto Cipolla,et al. Modelling uncertainty in deep learning for camera relocalization , 2015, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[6] Xiaowei Zhou,et al. 6-DoF object pose from semantic keypoints , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[7] Silvio Savarese,et al. Beyond PASCAL: A benchmark for 3D object detection in the wild , 2014, IEEE Winter Conference on Applications of Computer Vision.

[8] Deva Ramanan,et al. Analysis by Synthesis: 3D Object Recognition by Object Reconstruction , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[9] Andrew Zisserman,et al. Return of the Devil in the Details: Delving Deep into Convolutional Nets , 2014, BMVC.

[10] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[11] Yuandong Tian,et al. Single Image 3D Interpreter Network , 2016, ECCV.

[12] Ahmed M. Elgammal,et al. Digging Deep into the Layers of CNNs: In Search of How CNNs Achieve View Invariance , 2015, ICLR.

[13] Nancy Wilkins-Diehr,et al. XSEDE: Accelerating Scientific Discovery , 2014, Computing in Science & Engineering.

[14] R. Vidal,et al. Distributed 3-D localization in camera networks , 2009 .

[15] Ahmed M. Elgammal,et al. A Comparative Analysis and Study of Multiview CNN Models for Joint Object Categorization and Pose Estimation , 2016, ICML.

[16] Fei-Fei Li,et al. ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[17] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[18] Roberto Cipolla,et al. Geometric Loss Functions for Camera Pose Regression with Deep Learning , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19] Ralph Roskies,et al. Bridges: a uniquely flexible HPC resource for new communities and data analytics , 2015, XSEDE.

[20] Jitendra Malik,et al. Viewpoints and keypoints , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21] Takeo Kanade,et al. A statistical approach to 3d object detection applied to faces and cars , 2000 .

[22] Peter V. Gehler,et al. Teaching 3D geometry to deformable part models , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[23] S. Shankar Sastry,et al. An Invitation to 3-D Vision: From Images to Geometric Models , 2003 .