论文信息 - Pix3D: Dataset and Methods for Single-Image 3D Shape Modeling

Pix3D: Dataset and Methods for Single-Image 3D Shape Modeling

We study 3D shape modeling from a single image and make contributions to it in three aspects. First, we present Pix3D, a large-scale benchmark of diverse image-shape pairs with pixel-level 2D-3D alignment. Pix3D has wide applications in shape-related tasks including reconstruction, retrieval, viewpoint estimation, etc. Building such a large-scale dataset, however, is highly challenging; existing datasets either contain only synthetic data, or lack precise alignment between 2D images and 3D shapes, or only have a small number of images. Second, we calibrate the evaluation criteria for 3D shape reconstruction through behavioral studies, and use them to objectively and systematically benchmark cutting-edge reconstruction algorithms on Pix3D. Third, we design a novel model that simultaneously performs 3D reconstruction and pose estimation; our multi-task learning approach achieves state-of-the-art performance on both tasks.

[1] Jitendra Malik,et al. Hierarchical Surface Prediction for 3D Object Reconstruction , 2017, 2017 International Conference on 3D Vision (3DV).

[2] Krista A. Ehinger,et al. SUN database: Large-scale scene recognition from abbey to zoo , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[3] V. Lepetit,et al. EPnP: An Accurate O(n) Solution to the PnP Problem , 2009, International Journal of Computer Vision.

[4] Alexei A. Efros,et al. Seeing 3D Chairs: Exemplar Part-Based 2D-3D Alignment Using a Large Dataset of CAD Models , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[5] Thomas Brox,et al. Learning to Generate Chairs, Tables and Cars with Convolutional Networks , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6] ARNO KNAPITSCH,et al. Tanks and temples , 2017, ACM Trans. Graph..

[7] Thomas Lewiner,et al. Efficient Implementation of Marching Cubes' Cases with Topological Guarantees , 2003, J. Graphics, GPU, & Game Tools.

[8] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[9] Alla Sheffer,et al. Model Composition from Interchangeable Components , 2007, 15th Pacific Conference on Computer Graphics and Applications (PG'07).

[10] Pat Hanrahan,et al. Context-based search for 3D models , 2010, ACM Trans. Graph..

[11] Leonidas J. Guibas,et al. Render for CNN: Viewpoint Estimation in Images Using CNNs Trained with Rendered 3D Model Views , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[12] Antonio Torralba,et al. Building a database of 3D scenes from user annotations , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[13] Gernot Riegler,et al. OctNet: Learning Deep 3D Representations at High Resolutions , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14] Manolis I. A. Lourakis,et al. T-LESS: An RGB-D Dataset for 6D Pose Estimation of Texture-Less Objects , 2017, 2017 IEEE Winter Conference on Applications of Computer Vision (WACV).

[15] Matthias Nießner,et al. ScanNet: Richly-Annotated 3D Reconstructions of Indoor Scenes , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16] Leonidas J. Guibas,et al. The Earth Mover's Distance as a Metric for Image Retrieval , 2000, International Journal of Computer Vision.

[17] Pieter Abbeel,et al. BigBIRD: A large-scale 3D database of object instances , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[18] Jianxiong Xiao,et al. SUN RGB-D: A RGB-D scene understanding benchmark suite , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19] Robert C. Bolles,et al. Parametric Correspondence and Chamfer Matching: Two New Techniques for Image Matching , 1977, IJCAI.

[20] Wei Wu,et al. Large-Scale 3D Shape Reconstruction and Segmentation from ShapeNet Core55 , 2017, ArXiv.

[21] D. Marquardt. An Algorithm for Least-Squares Estimation of Nonlinear Parameters , 1963 .

[22] Silvio Savarese,et al. Beyond PASCAL: A benchmark for 3D object detection in the wild , 2014, IEEE Winter Conference on Applications of Computer Vision.

[23] Jiajun Wu,et al. MarrNet: 3D Shape Reconstruction via 2.5D Sketches , 2017, NIPS.

[24] Jiajun Wu,et al. Learning a Probabilistic Latent Space of Object Shapes via 3D Generative-Adversarial Modeling , 2016, NIPS.

[25] Alexander M. Bronstein,et al. Numerical Geometry of Non-Rigid Shapes , 2009, Monographs in Computer Science.

[26] Bernt Schiele,et al. Analyzing appearance and contour based methods for object categorization , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[27] Yuandong Tian,et al. Single Image 3D Interpreter Network , 2016, ECCV.

[28] Thomas Brox,et al. Multi-view 3D Models from Single Images with a Convolutional Network , 2015, ECCV.

[29] Cordelia Schmid,et al. Product Quantization for Nearest Neighbor Search , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30] Daniel Cremers,et al. Efficient Globally Optimal 2D-to-3D Deformable Shape Matching , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31] Jonathan T. Barron,et al. A category-level 3-D object dataset: Putting the Kinect to work , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[32] Derek Hoiem,et al. Indoor Segmentation and Support Inference from RGBD Images , 2012, ECCV.

[33] Thomas A. Funkhouser,et al. Semantic Scene Completion from a Single Depth Image , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34] Michael Firman,et al. RGBD Datasets: Past, Present and Future , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[35] Siddhartha S. Srinivasa,et al. Benchmarking in Manipulation Research: Using the Yale-CMU-Berkeley Object and Model Set , 2015, IEEE Robotics & Automation Magazine.

[36] Andrea Vedaldi,et al. Learning 3D Object Categories by Looking Around Them , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[37] P. Cochat,et al. Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[38] Dieter Fox,et al. A large-scale hierarchical multi-view RGB-D object dataset , 2011, 2011 IEEE International Conference on Robotics and Automation.

[39] Leonidas J. Guibas,et al. ShapeNet: An Information-Rich 3D Model Repository , 2015, ArXiv.

[40] Silvio Savarese,et al. Weakly Supervised 3D Reconstruction with Adversarial Constraint , 2017, 2017 International Conference on 3D Vision (3DV).

[41] Jitendra Malik,et al. Category-specific object reconstruction from a single image , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[42] Silvio Savarese,et al. 3D-R2N2: A Unified Approach for Single and Multi-view 3D Object Reconstruction , 2016, ECCV.

[43] Jitendra Malik,et al. Aligning 3D models to RGB-D images of cluttered scenes , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[44] Hao Zhang,et al. Robust 3D Shape Correspondence in the Spectral Domain , 2006, IEEE International Conference on Shape Modeling and Applications 2006 (SMI'06).

[45] Honglak Lee,et al. Perspective Transformer Nets: Learning Single-View 3D Object Reconstruction without 3D Supervision , 2016, NIPS.

[46] Vladlen Koltun,et al. A Large Dataset of Object Scans , 2016, ArXiv.

[47] D. Bertsekas. A distributed asynchronous relaxation algorithm for the assignment problem , 1985, 1985 24th IEEE Conference on Decision and Control.

[48] Thomas A. Funkhouser,et al. The Princeton Shape Benchmark , 2004, Proceedings Shape Modeling Applications, 2004..

[49] Stefan Leutenegger,et al. SceneNet RGB-D: Can 5M Synthetic Images Beat Generic ImageNet Pre-training on Indoor Segmentation? , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[50] Mathieu Aubry,et al. AtlasNet: A Papier-M\^ach\'e Approach to Learning 3D Surface Generation , 2018, CVPR 2018.

[51] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[52] Thomas Brox,et al. Octree Generating Networks: Efficient Convolutional Architectures for High-resolution 3D Outputs , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[53] Leonidas J. Guibas,et al. ObjectNet3D: A Large Scale Database for 3D Object Recognition , 2016, ECCV.

[54] Max Jaderberg,et al. Unsupervised Learning of 3D Structure from Images , 2016, NIPS.

[55] P. Fua,et al. Pose estimation for category specific multiview object localization , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[56] Sven J. Dickinson,et al. 3D Object Detection and Viewpoint Estimation with a Deformable 3D Cuboid Model , 2012, NIPS.

[57] Andreas Geiger,et al. Are we ready for autonomous driving? The KITTI vision benchmark suite , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[58] W. Marsden. I and J , 2012 .

[59] Kenneth Levenberg. A METHOD FOR THE SOLUTION OF CERTAIN NON – LINEAR PROBLEMS IN LEAST SQUARES , 1944 .

[60] Silvio Savarese,et al. 3D generic object categorization, localization and pose estimation , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[61] Andrew W. Fitzgibbon,et al. KinectFusion: real-time 3D reconstruction and interaction using a moving depth camera , 2011, UIST.

[62] Michael J. Black,et al. FAUST: Dataset and Evaluation for 3D Mesh Registration , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[63] Antonio Torralba,et al. Parsing IKEA Objects: Fine Pose Estimation , 2013, 2013 IEEE International Conference on Computer Vision.

[64] Abhinav Gupta,et al. Learning a Predictable and Generative Vector Representation for Objects , 2016, ECCV.

[65] Jiajun Wu,et al. Synthesizing 3D Shapes via Modeling Multi-view Depth Maps and Silhouettes with Deep Generative Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[66] Jitendra Malik,et al. Viewpoints and keypoints , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[67] Hao Su,et al. A Point Set Generation Network for 3D Object Reconstruction from a Single Image , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).