DeepPrimitive: Image decomposition by layered primitive detection

The perception of the visual world through basic building blocks, such as cubes, spheres, and cones, gives human beings a parsimonious understanding of the visual world. Thus, efforts to find primitive-based geometric interpretations of visual data date back to 1970s studies of visual media. However, due to the difficulty of primitive fitting in the pre-deep learning age, this research approach faded from the main stage, and the vision community turned primarily to semantic image understanding. In this paper, we revisit the classical problem of building geometric interpretations of images, using supervised deep learning tools. We build a framework to detect primitives from images in a layered manner by modifying the YOLO network; an RNN with a novel loss function is then used to equip this network with the capability to predict primitives with a variable number of parameters. We compare our pipeline to traditional and other baseline learning methods, demonstrating that our layered detection model has higher accuracy and performs better reconstruction.

[1]  Joshua B. Tenenbaum,et al.  Human-level concept learning through probabilistic program induction , 2015, Science.

[2]  Ali Farhadi,et al.  YOLO9000: Better, Faster, Stronger , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Sanja Fidler,et al.  Annotating Object Instances with a Polygon-RNN , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Jian Sun,et al.  Image vectorization using optimized gradient meshes , 2007, SIGGRAPH 2007.

[5]  Helmut Pottmann,et al.  Fitting B-spline curves to point clouds by curvature-based squared distance minimization , 2006, TOGS.

[6]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[7]  Jiajun Wu,et al.  Neural Scene De-rendering , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Jianguo Zhang,et al.  The PASCAL Visual Object Classes Challenge , 2006 .

[9]  I. Biederman Recognition-by-components: a theory of human image understanding. , 1987, Psychological review.

[10]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Gérard G. Medioni,et al.  Object modeling by registration of multiple range images , 1991, Proceedings. 1991 IEEE International Conference on Robotics and Automation.

[12]  Subhransu Maji,et al.  CSGNet: Neural Shape Parser for Constructive Solid Geometry , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[13]  Demis Hassabis,et al.  SCAN: Learning Abstract Hierarchical Compositional Visual Concepts , 2017, ArXiv.

[14]  Wenping Wang,et al.  Fast B-spline curve fitting by L-BFGS , 2011, Comput. Aided Geom. Des..

[15]  D. F. Rogers Constrained B-spline curve and surface fitting , 1989 .

[16]  Paul J. Besl,et al.  A Method for Registration of 3-D Shapes , 1992, IEEE Trans. Pattern Anal. Mach. Intell..

[17]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Philip H. S. Torr,et al.  Straight to Shapes: Real-Time Detection of Encoded Shapes , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Eli Shechtman,et al.  PatchMatch: a randomized correspondence algorithm for structural image editing , 2009, ACM Trans. Graph..

[20]  Kaiming He,et al.  Feature Pyramid Networks for Object Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Dana H. Ballard,et al.  Generalizing the Hough transform to detect arbitrary shapes , 1981, Pattern Recognit..

[22]  Qiang Ji,et al.  A new efficient ellipse detection method , 2002, Object recognition supported by user interaction for service robots.

[23]  Jürgen Schmidhuber,et al.  Learning Precise Timing with LSTM Recurrent Networks , 2003, J. Mach. Learn. Res..

[24]  Miriam Bellver,et al.  Hierarchical Object Detection with Deep Reinforcement Learning , 2016, NIPS 2016.

[25]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[26]  Richard O. Duda,et al.  Use of the Hough transformation to detect lines and curves in pictures , 1972, CACM.

[27]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[28]  Bruno Lévy,et al.  Ardeco: automatic region detection and conversion , 2006, EGSR '06.

[29]  Lawrence G. Roberts,et al.  Machine Perception of Three-Dimensional Solids , 1963, Outstanding Dissertations in the Computer Sciences.

[30]  Samy Bengio,et al.  Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks , 2015, NIPS.