Perspective Plane Program Induction From a Single Image

We study the inverse graphics problem of inferring a holistic representation for natural images. Given an input image, our goal is to induce a neuro-symbolic, program-like representation that jointly models camera poses, object locations, and global scene structures. Such high-level, holistic scene representations further facilitate low-level image manipulation tasks such as inpainting. We formulate this problem as jointly finding the camera pose and scene structure that best describe the input image. The benefits of such joint inference are two-fold: scene regularity serves as a new cue for perspective correction, and in turn, correct perspective correction leads to a simplified scene structure, similar to how the correct shape leads to the most regular texture in shape from texture. Our proposed framework, Perspective Plane Program Induction (P3I), combines search-based and gradient-based algorithms to efficiently solve the problem. P3I outperforms a set of baselines on a collection of Internet images, across tasks including camera pose estimation, global structure inference, and down-stream image manipulation tasks.

[1]  Narendra Ahuja,et al.  Image completion using planar structure guidance , 2014, ACM Trans. Graph..

[2]  Tony Beltramelli,et al.  pix2code: Generating Code from a Graphical User Interface Screenshot , 2017, EICS.

[3]  Ting-Chun Wang,et al.  Image Inpainting for Irregular Holes Using Partial Convolutions , 2018, ECCV.

[4]  Eli Shechtman,et al.  Image melding , 2012, ACM Trans. Graph..

[5]  Jean-Philippe Tardif,et al.  Non-iterative approach for fast and accurate vanishing point detection , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[6]  C. Lawrence Zitnick,et al.  Edge Boxes: Locating Object Proposals from Edges , 2014, ECCV.

[7]  Hiroshi Ishikawa,et al.  Globally and locally consistent image completion , 2017, ACM Trans. Graph..

[8]  Toshiyuki Sakai,et al.  Obtaining Surface Orientation from Texels under Perspective Projection , 1981, IJCAI.

[9]  Armando Solar-Lezama,et al.  Learning to Infer Graphics Programs from Hand-Drawn Images , 2017, NeurIPS.

[10]  Thomas S. Huang,et al.  Free-Form Image Inpainting With Gated Convolution , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[11]  Mehran Ebrahimi,et al.  EdgeConnect: Generative Image Inpainting with Adversarial Edge Learning , 2019, ArXiv.

[12]  Subhransu Maji,et al.  CSGNet: Neural Shape Parser for Constructive Solid Geometry , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[13]  Oriol Vinyals,et al.  Synthesizing Programs for Images using Reinforced Adversarial Learning , 2018, ICML.

[14]  Jiajun Wu,et al.  Learning to Describe Scenes with Programs , 2018, ICLR.

[15]  Alexander M. Rush,et al.  Image-to-Markup Generation with Coarse-to-Fine Attention , 2016, ICML.

[16]  Matthias Nießner,et al.  ScanNet: Richly-Annotated 3D Reconstructions of Indoor Scenes , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Koen E. A. van de Sande,et al.  Selective Search for Object Recognition , 2013, International Journal of Computer Vision.

[18]  Wei Xiong,et al.  Foreground-Aware Image Inpainting , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Dani Lischinski,et al.  Non-stationary texture synthesis by adversarial expansion , 2018, ACM Trans. Graph..

[20]  Frédo Durand,et al.  Burst Image Deblurring Using Permutation Invariant Convolutional Neural Networks , 2018, ECCV.

[21]  Daniel Cohen-Or,et al.  GRAINS , 2018, ACM Trans. Graph..

[22]  Ersin Yumer,et al.  PlaneNet: Piece-Wise Planar Reconstruction from a Single RGB Image , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[23]  Olivier Faugeras,et al.  Automatic calibration and removal of distortion from scenes of structured environments , 1995, Optics & Photonics.

[24]  Jitendra Malik,et al.  Computing Local Surface Orientation and Shape from Texture for Curved Surfaces , 1997, International Journal of Computer Vision.

[25]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[26]  Hao Li,et al.  High-Resolution Image Inpainting Using Multi-scale Neural Patch Synthesis , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Eunbyung Park,et al.  Unsupervised Doodling and Painting with Improved SPIRAL , 2019, ArXiv.

[28]  Jun Li,et al.  Im2Struct: Recovering 3D Shape Structure from a Single RGB Image , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[29]  Guillermo Sapiro,et al.  Filling-in by joint interpolation of vector fields and gray levels , 2001, IEEE Trans. Image Process..

[30]  Jiajun Wu,et al.  Neural Scene De-rendering , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Leonidas J. Guibas,et al.  PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space , 2017, NIPS.

[32]  Bolei Zhou,et al.  Places: A 10 Million Image Database for Scene Recognition , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[33]  Thomas S. Huang,et al.  Generative Image Inpainting with Contextual Attention , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[34]  Li-Yi Wei,et al.  Learning to Reconstruct 3D Manhattan Wireframes From a Single Image , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[35]  Leonidas J. Guibas,et al.  GRASS: Generative Recursive Autoencoders for Shape Structures , 2017, ACM Trans. Graph..

[36]  Michael J. Swain,et al.  Shape from Texture , 1985, IJCAI.

[37]  Luc Van Gool,et al.  Repeated Pattern Detection Using CNN Activations , 2017, 2017 IEEE Winter Conference on Applications of Computer Vision (WACV).

[38]  Andrew Zisserman,et al.  A Geometric Approach to Obtain a Bird's Eye View From an Image , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[39]  Shiguang Shan,et al.  Shift-Net: Image Inpainting via Deep Feature Rearrangement , 2018, ECCV.

[40]  Michael Ashikhmin,et al.  Synthesizing natural textures , 2001, I3D '01.

[41]  Wojciech Zaremba,et al.  Improved Techniques for Training GANs , 2016, NIPS.

[42]  Mayur Naik,et al.  Learning Neurosymbolic Generative Models via Program Synthesis , 2019, ICML.

[43]  Jiajun Wu,et al.  Program-Guided Image Manipulators , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[44]  Michal Irani,et al.  "Zero-Shot" Super-Resolution Using Deep Internal Learning , 2017, CVPR.

[45]  Paul Newman,et al.  The Right (Angled) Perspective: Improving the Understanding of Road Scenes Using Boosted Inverse Perspective Mapping , 2018, 2019 IEEE Intelligent Vehicles Symposium (IV).

[46]  Jiajun Wu,et al.  Learning to Infer and Execute 3D Shape Programs , 2019, ICLR.

[47]  Xuming He,et al.  Geometry-Aware Deep Network for Single-Image Novel View Synthesis , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[48]  Adam Finkelstein,et al.  PatchMatch: a randomized correspondence algorithm for structural image editing , 2009, SIGGRAPH 2009.

[49]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[50]  Narendra Ahuja,et al.  Shape From Texture: Integrating Texture-Element Extraction and Surface Estimation , 1989, IEEE Trans. Pattern Anal. Mach. Intell..

[51]  Alexei A. Efros,et al.  Image quilting for texture synthesis and transfer , 2001, SIGGRAPH.

[52]  Jun Li,et al.  Symmetry Hierarchy of Man‐Made Objects , 2011, Comput. Graph. Forum.

[53]  Daniel Cohen-Or,et al.  Repetition Maximization based Texture Rectification , 2012, Comput. Graph. Forum.