3D ShapeNets for 2.5D Object Recognition and Next-Best-View Prediction

3D shape is a crucial but heavily underutilized cue in object recognition, mostly due to the lack of a good generic shape representation. With the recent boost of inexpensive 2.5D depth sensors (e.g. Microsoft Kinect), it is even more urgent to have a useful 3D shape model in an object recognition pipeline. Furthermore, when the recognition has low confidence, it is important to have a fail-safe mode for object recognition systems to intelligently choose the best view to obtain extra observation from another viewpoint, in order to reduce the uncertainty as much as possible. To this end, we propose to represent a geometric 3D shape as a probability distribution of binary variables on a 3D voxel grid, using a Convolutional Deep Belief Network. Our model naturally supports object recognition from 2.5D depth map and also view planning for object recognition. We construct a large-scale 3D computer graphics dataset to train our model, and conduct extensive experiments to study this new representation.

[1]  I. Biederman Recognition-by-components: a theory of human image understanding. , 1987, Psychological review.

[2]  Paul J. Besl,et al.  Method for registration of 3-D shapes , 1992, Other Conferences.

[3]  Joachim Denzler,et al.  Information Theoretic Sensor Data Selection for Active Object Recognition and State Estimation , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[4]  Geoffrey E. Hinton Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[5]  Joachim Denzler,et al.  Viewpoint Selection - Planning Optimal Sequences of Views for Object Recognition , 2003, CAIP.

[6]  G. Roth,et al.  View planning for automated three-dimensional object reconstruction and inspection , 2003, CSUR.

[7]  Szymon Rusinkiewicz,et al.  Rotation Invariant Spherical Harmonic Representation of 3D Shape Descriptors , 2003, Symposium on Geometry Processing.

[8]  Ming Ouhyoung,et al.  On Visual Similarity Based 3D Model Retrieval , 2003, Comput. Graph. Forum.

[9]  Thomas A. Funkhouser,et al.  The Princeton Shape Benchmark , 2004, Proceedings Shape Modeling Applications, 2004..

[10]  Cordelia Schmid,et al.  3D Object Modeling and Recognition Using Local Affine-Invariant Image Descriptors and Multi-View Spatial Constraints , 2006, International Journal of Computer Vision.

[11]  Joseph L. Mundy,et al.  Object Recognition in the Geometric Era: A Retrospective , 2006, Toward Category-Level Object Recognition.

[12]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[13]  Honglak Lee,et al.  Sparse deep belief net model for visual area V2 , 2007, NIPS.

[14]  Geoffrey E. Hinton,et al.  Using fast weights to improve persistent contrastive divergence , 2009, ICML '09.

[15]  Tsuhan Chen,et al.  Active view selection for object and pose recognition , 2009, 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops.

[16]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Krista A. Ehinger,et al.  SUN database: Large-scale scene recognition from abbey to zoo , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[18]  Honglak Lee,et al.  Unsupervised learning of hierarchical representations with convolutional deep belief networks , 2011, Commun. ACM.

[19]  Nicolas Heess,et al.  The Shape Boltzmann Machine: A strong model of object shape , 2012, CVPR.

[20]  Pieter Abbeel,et al.  A textured object recognition pipeline for color and depth image data , 2012, 2012 IEEE International Conference on Robotics and Automation.

[21]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[22]  Derek Hoiem,et al.  Indoor Segmentation and Support Inference from RGBD Images , 2012, ECCV.

[23]  Andrew Y. Ng,et al.  Convolutional-Recursive Deep Learning for 3D Object Classification , 2012, NIPS.

[24]  Geoffrey E. Hinton A Practical Guide to Training Restricted Boltzmann Machines , 2012, Neural Networks: Tricks of the Trade.

[25]  George J. Pappas,et al.  Nonmyopic View Planning for Active Object Detection , 2013, ArXiv.

[26]  George J. Pappas,et al.  Hypothesis testing framework for active object detection , 2013, 2013 IEEE International Conference on Robotics and Automation.

[27]  Ali Farhadi,et al.  Predicting Failures of Vision Systems , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.