论文信息 - Viewpoint Estimation for Objects with Convolutional Neural Network Trained on Synthetic Images

Viewpoint Estimation for Objects with Convolutional Neural Network Trained on Synthetic Images

In this paper, we propose a method to estimate object viewpoint from a single RGB image and address two problems in estimation: generating training data with viewpoint annotations and extracting powerful features for the estimation. We first collect 1780 high quality 3D CAD object models of 3 categories. Then we generate a synthetic RGB image dataset with viewpoint annotations, in which each image is generated by placing one model in a realistic panorama scene and rendering the model with a random camera parameters. We train a CNN model on our synthetic dataset to predict the object viewpoint. The proposed method is evaluated on PASCAL 3D+ dataset and our synthetic dataset. The experiment results show good performance.

[1] Cristóbal Curio,et al. Monocular car viewpoint estimation with circular regression forests , 2013, 2013 IEEE Intelligent Vehicles Symposium (IV).

[2] Kate Saenko,et al. Exploring Invariances in Deep Convolutional Neural Networks Using Synthetic Images , 2014, ArXiv.

[3] Xiaofeng Ren,et al. Discriminative Mixture-of-Templates for Viewpoint Classification , 2010, ECCV.

[4] Michael Goesele,et al. Back to the Future: Learning Shape Models from 3D CAD Data , 2010, BMVC.

[5] Leonidas J. Guibas,et al. Render for CNN: Viewpoint Estimation in Images Using CNNs Trained with Rendered 3D Model Views , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[6] Ramakant Nevatia,et al. Description and Recognition of Curved Objects , 1977, Artif. Intell..

[7] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[8] Leonidas J. Guibas,et al. ShapeNet: An Information-Rich 3D Model Repository , 2015, ArXiv.

[9] Pavel Zemcík,et al. Real-Time Pose Estimation Piggybacked on Object Detection , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[10] Geoffrey E. Hinton,et al. Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[11] Jitendra Malik,et al. Viewpoints and keypoints , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12] D. Navon. Forest before trees: The precedence of global features in visual perception , 1977, Cognitive Psychology.

[13] Antonio Torralba,et al. FPM: Fine Pose Parts-Based Model with 3D CAD Models , 2014, ECCV.

[14] Silvio Savarese,et al. Beyond PASCAL: A benchmark for 3D object detection in the wild , 2014, IEEE Winter Conference on Applications of Computer Vision.

[15] Sinisa Todorovic,et al. From contours to 3D object detection and pose estimation , 2011, 2011 International Conference on Computer Vision.

[16] Alex Pentland,et al. A Bayesian Computer Vision System for Modeling Human Interactions , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[17] Alexei A. Efros,et al. Seeing 3D Chairs: Exemplar Part-Based 2D-3D Alignment Using a Large Dataset of CAD Models , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[18] Mario Fritz,et al. Appearance-based gaze estimation in the wild , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19] Silvio Savarese,et al. A coarse-to-fine model for 3D pose estimation and sub-category recognition , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20] Silvio Savarese,et al. Learning a dense multi-view representation for detection, viewpoint classification and synthesis of object categories , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[21] Sven J. Dickinson,et al. 3D Object Detection and Viewpoint Estimation with a Deformable 3D Cuboid Model , 2012, NIPS.

[22] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.