How Does a Camera Look at One 3D CAD Object?

Camera pose and the camera’s rotation angles and translation vector (RT), are one-to-one relation with a 2D real image when the intrinsic parameter is fixed. In this paper, we propose a novel convolutional neural network (CNN) based framework to intelligently estimate the 6-DOF RTs from images taken on one 3D CAD object directly and indirectly, as well as visually verifying the correctness of the predicted RTs. Such a framework enables us to accurately interpret how a camera looks at the object. The direct way is simple and obtains lower average errors for the predicted RTs experimentally, while the indirect way utilizes the POSIT algorithm via landmarks and is able to avoid the non-Euclidean issue in rotation angles. To our best knowledge, we are the first one to estimate camera’s RTs and effectively interprets how a camera looks at one 3D CAD object from the images taken on it. The experiments on four models quantitatively and qualitatively demonstrate the efficacy of our proposed approach.

[1]  Peter V. Gehler,et al.  3D2PM - 3D Deformable Part Models , 2012, ECCV.

[2]  Roberto Cipolla,et al.  PoseNet: A Convolutional Network for Real-Time 6-DOF Camera Relocalization , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[3]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[4]  Larry S. Davis,et al.  Model-based object pose in 25 lines of code , 1992, International Journal of Computer Vision.

[5]  Ahmed M. Elgammal,et al.  Regression from local features for viewpoint and pose estimation , 2011, 2011 International Conference on Computer Vision.

[6]  Chunhua Shen,et al.  Pushing the Limits of Deep CNNs for Pedestrian Detection , 2016, IEEE Transactions on Circuits and Systems for Video Technology.

[7]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[8]  Roberto Cipolla,et al.  Geometric Loss Functions for Camera Pose Regression with Deep Learning , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Justus H. Piater,et al.  Continuous Pose Estimation in 2D Images at Instance and Category Levels , 2013, 2013 International Conference on Computer and Robot Vision.

[10]  Larry S. Davis,et al.  Model-Based Object Pose in 25 Lines of Code , 1992, ECCV.

[11]  Xiaoming Liu,et al.  Pose-Invariant Face Alignment via CNN-Based Dense 3D Model Fitting , 2017, International Journal of Computer Vision.

[12]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Ashutosh Saxena,et al.  Learning 3-D object orientation from images , 2009, 2009 IEEE International Conference on Robotics and Automation.

[14]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Yann LeCun,et al.  What is the best multi-stage architecture for object recognition? , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[16]  Peter V. Gehler,et al.  Teaching 3D geometry to deformable part models , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.