3D Wireframe Modeling and Viewpoint Estimation for Multi-Class Objects Combining Deep Neural Network and Deformable Model Matching

The accuracy of 3D viewpoint and shape estimation from 2D images has been greatly improved by machine learning, especially deep learning technology such as the convolution neural network (CNN). However, current methods are always valid only for one specific category and have exhibited poor performance when generalized to other categories, which means that multiple detectors or networks are needed for multi-class object image cases. In this paper, we propose a method with strong generalization ability, which incorporates only one CNN with deformable model matching processing for the 3D viewpoint and the shape estimation of multi-class object image cases. The CNN is utilized to detect keypoints of the potential object from the image, while a deformable model matching stage is designed to conduct 3D wireframe modeling and viewpoint estimation simultaneously with the support of the detected keypoints. Besides, parameter estimation by deformable model matching processing has robust fault-tolerance to the keypoint detection results containing mistaken keypoints. The proposed method is evaluated on Pascal3D+ dataset. Experiments show that the proposed method performs well in both parameter estimation accuracy and the multi-class objects generalization. This research is a useful exploration to extend the generalization of deep learning in specific tasks.

[1]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Yaser Sheikh,et al.  Real-time scalable 6DOF pose estimation for textureless objects , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[3]  Zhiguo Jiang,et al.  Vision-Based Pose Estimation for Textureless Space Objects by Contour Points Matching , 2018, IEEE Transactions on Aerospace and Electronic Systems.

[4]  Lawrence G. Roberts,et al.  Machine Perception of Three-Dimensional Solids , 1963, Outstanding Dissertations in the Computer Sciences.

[5]  Jana Kosecka,et al.  3D Bounding Box Estimation Using Deep Learning and Geometry , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Gregory D. Hager,et al.  Deep Supervision with Shape Concepts for Occlusion-Aware 3D Object Parsing , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Feng Zhou,et al.  Deep Deformation Network for Object Landmark Localization , 2016, ECCV.

[8]  Jitendra Malik,et al.  Viewpoints and keypoints , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Rodney A. Brooks,et al.  Symbolic Reasoning Among 3-D Models and 2-D Images , 1981, Artif. Intell..

[10]  Qi-Xing Huang,et al.  StarMap for Category-Agnostic Keypoint and Viewpoint Estimation , 2018, ECCV.

[11]  Silvio Savarese,et al.  Beyond PASCAL: A benchmark for 3D object detection in the wild , 2014, IEEE Winter Conference on Applications of Computer Vision.

[12]  Jia Deng,et al.  Stacked Hourglass Networks for Human Pose Estimation , 2016, ECCV.

[13]  Yuandong Tian,et al.  3D Interpreter Networks for Viewer-Centered Wireframe Modeling , 2018, International Journal of Computer Vision.

[14]  Christian Szegedy,et al.  DeepPose: Human Pose Estimation via Deep Neural Networks , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Zhiguo Jiang,et al.  Vision-based pose estimation for space objects by Gaussian process regression , 2015, 2015 IEEE Aerospace Conference.

[16]  Peter V. Gehler,et al.  Teaching 3D geometry to deformable part models , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  Xiaowei Zhou,et al.  Sparse Representation for 3D Shape Estimation: A Convex Relaxation Approach , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.