Learning Shape Abstractions by Assembling Volumetric Primitives

We present a learning framework for abstracting complex shapes by learning to assemble objects using 3D volumetric primitives. In addition to generating simple and geometrically interpretable explanations of 3D objects, our framework also allows us to automatically discover and exploit consistent structure in the data. We demonstrate that using our method allows predicting shape representations which can be leveraged for obtaining a consistent parsing across the instances of a shape collection and constructing an interpretable shape similarity measure. We also examine applications for image-based prediction as well as shape manipulation.

[1]  Lawrence G. Roberts,et al.  Machine Perception of Three-Dimensional Solids , 1963, Outstanding Dissertations in the Computer Sciences.

[2]  I. Biederman Recognition-by-components: a theory of human image understanding. , 1987, Psychological review.

[3]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[4]  Alexei A. Efros,et al.  Using Multiple Segmentations to Discover Objects and their Extent in Image Collections , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[5]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[6]  Alexei A. Efros,et al.  Segmenting Scenes by Matching Image Composites , 2009, NIPS.

[7]  Alexei A. Efros,et al.  Blocks World Revisited: Image Understanding Using Qualitative Geometry and Mechanics , 2010, ECCV.

[8]  Vladlen Koltun,et al.  Joint shape segmentation with linear programming , 2011, ACM Trans. Graph..

[9]  Daniel Cohen-Or,et al.  Unsupervised co-segmentation of a set of shapes via descriptor-space spectral clustering , 2011, ACM Trans. Graph..

[10]  Siddhartha Chaudhuri,et al.  A probabilistic model for component-based shape synthesis , 2012, ACM Trans. Graph..

[11]  Levent Burak Kara,et al.  Co-abstraction of shape collections , 2012, ACM Trans. Graph..

[12]  Daniel Cohen-Or,et al.  Active co-analysis of a set of shapes , 2012, ACM Trans. Graph..

[13]  Michal Irani,et al.  Co-segmentation by Composition , 2013, 2013 IEEE International Conference on Computer Vision.

[14]  Ce Liu,et al.  Scene Collaging: Analysis and Synthesis of Natural Images with Semantic Layers , 2013, 2013 IEEE International Conference on Computer Vision.

[15]  Antonio Torralba,et al.  Parsing IKEA Objects: Fine Pose Estimation , 2013, 2013 IEEE International Conference on Computer Vision.

[16]  Ce Liu,et al.  Unsupervised Joint Object Discovery and Segmentation in Internet Images , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  N. Mitra,et al.  Meta-representation of shape families , 2014, ACM Trans. Graph..

[18]  Lourdes Agapito,et al.  Reconstructing PASCAL VOC , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[20]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[21]  Silvio Savarese,et al.  Beyond PASCAL: A benchmark for 3D object detection in the wild , 2014, IEEE Winter Conference on Applications of Computer Vision.

[22]  Levent Burak Kara,et al.  Co-constrained handles for deformation in shape collections , 2014, ACM Trans. Graph..

[23]  Daniel Cohen-Or,et al.  Recurring part arrangements in shape collections , 2014, Comput. Graph. Forum.

[24]  Leonidas J. Guibas,et al.  Data-driven structural priors for shape completion , 2015, ACM Trans. Graph..

[25]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[26]  Jitendra Malik,et al.  Category-specific object reconstruction from a single image , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Vladlen Koltun,et al.  Single-view reconstruction via joint analysis of image and shape collections , 2015, ACM Trans. Graph..

[28]  Lourdes Agapito,et al.  Part-based modelling of compound scenes from images , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Leonidas J. Guibas,et al.  Joint embeddings of shapes and images via CNN image purification , 2015, ACM Trans. Graph..

[30]  Joshua B. Tenenbaum,et al.  Deep Convolutional Inverse Graphics Network , 2015, NIPS.

[31]  Leonidas J. Guibas,et al.  Render for CNN: Viewpoint Estimation in Images Using CNNs Trained with Rendered 3D Model Views , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[32]  Kevin Murphy,et al.  Efficient inference in occlusion-aware generative models of images , 2015, ArXiv.

[33]  Leonidas J. Guibas,et al.  ShapeNet: An Information-Rich 3D Model Repository , 2015, ArXiv.

[34]  Alex Graves,et al.  DRAW: A Recurrent Neural Network For Image Generation , 2015, ICML.

[35]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[36]  Bruno A. Olshausen,et al.  Discovering Hidden Factors of Variation in Deep Networks , 2014, ICLR.

[37]  Geoffrey E. Hinton,et al.  Attend, Infer, Repeat: Fast Scene Understanding with Generative Models , 2016, NIPS.

[38]  Jitendra Malik,et al.  Cross Modal Distillation for Supervision Transfer , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Abhinav Gupta,et al.  Learning a Predictable and Generative Vector Representation for Objects , 2016, ECCV.

[40]  Soumith Chintala,et al.  Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.