论文信息 - CSGNet: Neural Shape Parser for Constructive Solid Geometry

CSGNet: Neural Shape Parser for Constructive Solid Geometry

We present a neural architecture that takes as input a 2D or 3D shape and outputs a program that generates the shape. The instructions in our program are based on constructive solid geometry principles, i.e., a set of boolean operations on shape primitives defined recursively. Bottom-up techniques for this shape parsing task rely on primitive detection and are inherently slow since the search space over possible primitive combinations is large. In contrast, our model uses a recurrent neural network that parses the input shape in a top-down manner, which is significantly faster and yields a compact and easy-to-interpret sequence of modeling instructions. Our model is also more effective as a shape detector compared to existing state-of-the-art detection techniques. We finally demonstrate that our network can be trained on novel datasets without ground-truth program annotations through policy gradient techniques.

[1] Quoc V. Le,et al. Neural Programmer: Inducing Latent Programs with Gradient Descent , 2015, ICLR.

[2] Kaiming He,et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3] H. Seidel,et al. A connection between partial symmetry and inverse procedural modeling , 2010, ACM Trans. Graph..

[4] Ersin Yumer,et al. 3D-PRNN: Generating Shape Primitives with Recurrent Neural Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[5] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[6] Luc Van Gool,et al. Bayesian Grammar Learning for Inverse Procedural Modeling , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[7] Geoffrey E. Hinton,et al. Attend, Infer, Repeat: Fast Scene Understanding with Generative Models , 2016, NIPS.

[8] Chen Liang,et al. Neural Symbolic Machines: Learning Semantic Parsers on Freebase with Weak Supervision , 2016, ACL.

[9] Iasonas Kokkinos,et al. Shape grammar parsing via Reinforcement Learning , 2011, CVPR 2011.

[10] Silvio Savarese,et al. 3D-R2N2: A Unified Approach for Single and Multi-view 3D Object Reconstruction , 2016, ECCV.

[11] Pat Hanrahan,et al. Neurally-Guided Procedural Models: Amortized Inference for Procedural Graphics Programs using Neural Networks , 2016, NIPS.

[12] Leonidas J. Guibas,et al. Learning Shape Abstractions by Assembling Volumetric Primitives , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13] Richard O. Duda,et al. Use of the Hough transformation to detect lines and curves in pictures , 1972, CACM.

[14] M. J. D. Powell,et al. An efficient method for finding the minimum of a function of several variables without calculating derivatives , 1964, Comput. J..

[15] Nando de Freitas,et al. Neural Programmer-Interpreters , 2015, ICLR.

[16] Yi Yang,et al. Articulated pose estimation with flexible mixtures-of-parts , 2011, CVPR 2011.

[17] Daniel G. Aliaga,et al. Interactive sketching of urban procedural models , 2016, ACM Trans. Graph..

[18] Tomas Mikolov,et al. Inferring Algorithmic Patterns with Stack-Augmented Recurrent Nets , 2015, NIPS.

[19] Wojciech Zaremba,et al. Learning to Execute , 2014, ArXiv.

[20] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[21] Yoshua Bengio,et al. Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.

[22] Subhransu Maji,et al. Detecting People Using Mutually Consistent Poselet Activations , 2010, ECCV.

[23] Joshua B. Tenenbaum,et al. Picture: A probabilistic programming language for scene perception , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24] Ersin Yumer,et al. Shape Synthesis from Sketches via Procedural Models and Convolutional Networks , 2017, IEEE Transactions on Visualization and Computer Graphics.

[25] Sebastian Nowozin,et al. DeepCoder: Learning to Write Programs , 2016, ICLR.

[26] David H. Laidlaw,et al. Constructive solid geometry for polyhedral objects , 1986, SIGGRAPH.

[27] A. Yuille,et al. Opinion TRENDS in Cognitive Sciences Vol.10 No.7 July 2006 Special Issue: Probabilistic models of cognition Vision as Bayesian inference: analysis by synthesis? , 2022 .

[28] Joshua B. Tenenbaum,et al. Deep Convolutional Inverse Graphics Network , 2015, NIPS.

[29] P. Cochat,et al. Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[30] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[31] Hao Su,et al. A Point Set Generation Network for 3D Object Reconstruction from a Single Image , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32] I. Biederman. Recognition-by-components: a theory of human image understanding. , 1987, Psychological review.

[33] Andrew Zisserman,et al. Return of the Devil in the Details: Delving Deep into Convolutional Nets , 2014, BMVC.

[34] Thomas Brox,et al. Learning to Generate Chairs, Tables and Cars with Convolutional Networks , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[35] Jeffrey D. Ullman,et al. Introduction to Automata Theory, Languages and Computation , 1979 .

[36] Lukasz Kaiser,et al. Neural GPUs Learn Algorithms , 2015, ICLR.

[37] Daniel P. Huttenlocher,et al. Pictorial Structures for Object Recognition , 2004, International Journal of Computer Vision.

[38] Armando Solar-Lezama,et al. Learning to Infer Graphics Programs from Hand-Drawn Images , 2017, NeurIPS.

[39] Pat Hanrahan,et al. Controlling procedural modeling programs with stochastically-ordered sequential Monte Carlo , 2015, ACM Trans. Graph..

[40] Dan Klein,et al. Neural Module Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[41] Li Fei-Fei,et al. Inferring and Executing Programs for Visual Reasoning , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[42] Wojciech Zaremba,et al. Learning Simple Algorithms from Examples , 2015, ICML.

[43] Trevor Darrell,et al. Learning to Reason: End-to-End Module Networks for Visual Question Answering , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[44] Martin A. Fischler,et al. The Representation and Matching of Pictorial Structures , 1973, IEEE Transactions on Computers.

[45] Pushmeet Kohli,et al. Vision-as-Inverse-Graphics: Obtaining a Rich 3D Explanation of a Scene from a Single Image , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[46] Jiajun Wu,et al. Neural Scene De-rendering , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[47] Donald E. Knuth,et al. On the Translation of Languages from Left to Right , 1965, Inf. Control..

[48] Misha Denil,et al. Programmable Agents , 2017, ArXiv.

[49] Radomír Mech,et al. Learning design patterns with bayesian grammar induction , 2012, UIST.