3 D ShapeNets : A Deep Representation for Volumetric Shape Modeling

3D shape is a crucial but heavily underutilized cue in today’s computer vision system, mostly due to the lack of a good generic shape representation. With the recent availability of inexpensive 2.5D depth sensors (e.g. Microsoft Kinect), it is becoming increasingly important to have a powerful 3D shape model in the loop. Apart from object recognition on 2.5D depth maps, recovering these incomplete 3D shapes to full 3D is critical for analyzing shape variations. To this end, we propose to represent a geometric 3D shape as a probability distribution of binary variables on a 3D voxel grid, using a Convolutional Deep Belief Network. Our model, 3D ShapeNets, learns the distribution of complex 3D shapes across different object categories and arbitrary poses. It naturally supports joint object recognition and shape reconstruction from 2.5D depth maps, and further, as an additional application it allows active object recognition through view planning. We construct a largescale 3D CAD model dataset to train our model, and conduct extensive experiments to study our new representation.

[1]  I. Biederman Recognition-by-components: a theory of human image understanding. , 1987, Psychological review.

[2]  Paul J. Besl,et al.  Method for registration of 3-D shapes , 1992, Other Conferences.

[3]  Joachim Denzler,et al.  Information Theoretic Sensor Data Selection for Active Object Recognition and State Estimation , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[4]  Geoffrey E. Hinton Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[5]  G. Roth,et al.  View planning for automated three-dimensional object reconstruction and inspection , 2003, CSUR.

[6]  Szymon Rusinkiewicz,et al.  Rotation Invariant Spherical Harmonic Representation of 3D Shape Descriptors , 2003, Symposium on Geometry Processing.

[7]  Ming Ouhyoung,et al.  On Visual Similarity Based 3D Model Retrieval , 2003, Comput. Graph. Forum.

[8]  Thomas A. Funkhouser,et al.  The Princeton Shape Benchmark , 2004, Proceedings Shape Modeling Applications, 2004..

[9]  Frank P. Ferrie,et al.  Active Object Recognition: Looking for Differences , 2001, International Journal of Computer Vision.

[10]  Szymon Rusinkiewicz,et al.  Modeling by example , 2004, SIGGRAPH 2004.

[11]  Cordelia Schmid,et al.  3D Object Modeling and Recognition Using Local Affine-Invariant Image Descriptors and Multi-View Spatial Constraints , 2006, International Journal of Computer Vision.

[12]  Joseph L. Mundy,et al.  Object Recognition in the Geometric Era: A Retrospective , 2006, Toward Category-Level Object Recognition.

[13]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[14]  Geoffrey E. Hinton,et al.  Restricted Boltzmann machines for collaborative filtering , 2007, ICML '07.

[15]  Honglak Lee,et al.  Sparse deep belief net model for visual area V2 , 2007, NIPS.

[16]  Geoffrey E. Hinton,et al.  Using fast weights to improve persistent contrastive divergence , 2009, ICML '09.

[17]  Tsuhan Chen,et al.  Active view selection for object and pose recognition , 2009, 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops.

[18]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Marco Attene,et al.  A lightweight approach to repairing digitized polygon meshes , 2010, The Visual Computer.

[20]  Krista A. Ehinger,et al.  SUN database: Large-scale scene recognition from abbey to zoo , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[21]  Ariel Shamir,et al.  Cone carving for surface reconstruction , 2010, SIGGRAPH 2010.

[22]  Honglak Lee,et al.  Unsupervised learning of hierarchical representations with convolutional deep belief networks , 2011, Commun. ACM.

[23]  Leonidas J. Guibas,et al.  Probabilistic reasoning for assembly-based 3D modeling , 2011, SIGGRAPH 2011.

[24]  Siddhartha Chaudhuri,et al.  A probabilistic model for component-based shape synthesis , 2012, ACM Trans. Graph..

[25]  Nicolas Heess,et al.  The Shape Boltzmann Machine: A strong model of object shape , 2012, CVPR.

[26]  Ke Xie,et al.  A search-classify approach for cluttered indoor scene understanding , 2012, ACM Trans. Graph..

[27]  Pieter Abbeel,et al.  A textured object recognition pipeline for color and depth image data , 2012, 2012 IEEE International Conference on Robotics and Automation.

[28]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[29]  Shi-Min Hu,et al.  Structure recovery by part assembly , 2012, ACM Trans. Graph..

[30]  Derek Hoiem,et al.  Indoor Segmentation and Support Inference from RGBD Images , 2012, ECCV.

[31]  Andrew Y. Ng,et al.  Convolutional-Recursive Deep Learning for 3D Object Classification , 2012, NIPS.

[32]  George J. Pappas,et al.  Nonmyopic View Planning for Active Object Detection , 2013, ArXiv.

[33]  George J. Pappas,et al.  Hypothesis testing framework for active object detection , 2013, 2013 IEEE International Conference on Robotics and Automation.

[34]  A. Khosla,et al.  A Deep Representation for Volumetric Shape Modeling , 2015 .