AutoSweep: Recovering 3D Editable Objects from a Single Photograph

This paper presents a fully automatic framework for extracting editable 3D objects directly from a single photograph. Unlike previous methods which recover either depth maps, point clouds, or mesh surfaces, we aim to recover 3D objects with semantic parts and can be directly edited. We base our work on the assumption that most human-made objects are constituted by parts and these parts can be well represented by generalized primitives. Our work makes an attempt towards recovering two types of primitive-shaped objects, namely, generalized cuboids and generalized cylinders. To this end, we build a novel instance-aware segmentation network for accurate part separation. Our GeoNet outputs a set of smooth part-level masks labeled as profiles and bodies. Then in a key stage, we simultaneously identify profile-body relations and recover 3D parts by sweeping the recognized profile along their body contour and jointly optimize the geometry to align with the recovered masks. Qualitative and quantitative experiments show that our algorithm can recover high quality 3D models and outperforms existing methods in both instance segmentation and 3D reconstruction.

[1]  Kun Zhou,et al.  Online Structure Analysis for Real-Time Indoor Scene Reconstruction , 2015, ACM Trans. Graph..

[2]  Abhinav Gupta,et al.  Marr Revisited: 2D-3D Alignment via Surface Normal Prediction , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Leonidas J. Guibas,et al.  ShapeNet: An Information-Rich 3D Model Repository , 2015, ArXiv.

[4]  Jitendra Malik,et al.  Modeling and Rendering Architecture from Photographs: A hybrid geometry- and image-based approach , 1996, SIGGRAPH.

[5]  Ming-Ming Cheng Curve Structure Extraction for Cartoon Images , 1996 .

[6]  Kun Zhou,et al.  Interactive images , 2012, ACM Trans. Graph..

[7]  Alexei A. Efros,et al.  Geometric context from a single image , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[8]  Allen Y. Yang,et al.  On Symmetry and Multiple-View Geometry: Structure, Pose, and Calibration from a Single Image , 2004, International Journal of Computer Vision.

[9]  Iasonas Kokkinos,et al.  DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Leonidas J. Guibas,et al.  Learning Shape Abstractions by Assembling Volumetric Primitives , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Yukinori Kakazu,et al.  A generalized sweeping method for SGC modeling , 1991, SMA '91.

[12]  Geoff Wyvill,et al.  Swirling-sweepers: constant-volume modeling , 2004, 12th Pacific Conference on Computer Graphics and Applications, 2004. PG 2004. Proceedings..

[13]  Daniel Cohen-Or,et al.  3-Sweep , 2013, ACM Trans. Graph..

[14]  Jianxiong Xiao,et al.  Localizing 3D cuboids in single-view images , 2012, NIPS.

[15]  Hao Su,et al.  A Point Set Generation Network for 3D Object Reconstruction from a Single Image , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Kun Zhou,et al.  Imagining the unseen , 2014, ACM Trans. Graph..

[17]  Daniel Cohen-Or,et al.  GlobFit: consistently fitting primitives by discovering global relations , 2011, ACM Trans. Graph..

[18]  Deva Ramanan,et al.  Categorizing cubes: Revisiting pose normalization , 2016, 2016 IEEE Winter Conference on Applications of Computer Vision (WACV).

[19]  Mario Fritz,et al.  Novel Views of Objects from a Single Image , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Tomasz Malisiewicz,et al.  Deep Cuboid Detection: Beyond 2D Bounding Boxes , 2016, ArXiv.

[21]  Derek Hoiem,et al.  Support Surface Prediction in Indoor Scenes , 2013, 2013 IEEE International Conference on Computer Vision.

[22]  Ashutosh Saxena,et al.  Learning Depth from Single Monocular Images , 2005, NIPS.

[23]  Edward R. Dougherty,et al.  An introduction to morphological image processing , 1992 .

[24]  Yi Li,et al.  Instance-Sensitive Fully Convolutional Networks , 2016, ECCV.

[25]  Wei Wu,et al.  Large-Scale 3D Shape Reconstruction and Segmentation from ShapeNet Core55 , 2017, ArXiv.

[26]  Yaser Sheikh,et al.  3D object manipulation in a single photograph using stock 3D models , 2014, ACM Trans. Graph..

[27]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[28]  Ross B. Girshick,et al.  Mask R-CNN , 2017, 1703.06870.

[29]  Ronan Collobert,et al.  Learning to Refine Object Segments , 2016, ECCV.

[30]  Rodney A. Brooks,et al.  The ACRONYM Model-Based Vision System , 1979, IJCAI.

[31]  Jitendra Malik,et al.  Simultaneous Detection and Segmentation , 2014, ECCV.

[32]  Alexei A. Efros,et al.  Blocks World Revisited: Image Understanding Using Qualitative Geometry and Mechanics , 2010, ECCV.

[33]  Rubaiat Habib Kazi,et al.  Experimental Evaluation of Sketching on Surfaces in VR , 2017, CHI.

[34]  Ian D. Reid,et al.  Goal-directed Video Metrology , 1996, ECCV.

[35]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[36]  Jochen Lang,et al.  Skeleton pruning by contour approximation and the integer medial axis transform , 2012, Comput. Graph..

[37]  Sylvain Lefebvre,et al.  Make it stand , 2013, ACM Trans. Graph..

[38]  Chenglong Li,et al.  Progressive 3D shape abstraction via hierarchical CSG tree , 2017, International Workshop on Pattern Recognition.

[39]  Hongming Cai,et al.  D-Sweep: Using Profile Snapping for 3D Object Extraction from Single Image , 2014, Smart Graphics.

[40]  Sven J. Dickinson,et al.  3-D Volumetric Shape Abstraction from a Single 2-D Image , 2015, 2015 IEEE International Conference on Computer Vision Workshop (ICCVW).

[41]  Yi Li,et al.  Deformable Convolutional Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[42]  D. Cohen-Or,et al.  Parametric reshaping of human bodies in images , 2010, ACM Trans. Graph..

[43]  Ian D. Reid,et al.  Single View Metrology , 2000, International Journal of Computer Vision.

[44]  Silvio Savarese,et al.  3D-R2N2: A Unified Approach for Single and Multi-view 3D Object Reconstruction , 2016, ECCV.

[45]  Loong Fah Cheong,et al.  Symmetric architecture modeling with a single image , 2009, ACM Trans. Graph..

[46]  Requicha,et al.  Solid Modeling: A Historical Summary and Contemporary Assessment , 1982, IEEE Computer Graphics and Applications.

[47]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[48]  Gérard G. Medioni,et al.  Reconstructing mirror symmetric scenes from a single view using 2-view stereo geometry , 2002, Object recognition supported by user interaction for service robots.

[49]  Jana Kosecka,et al.  Joint Semantic Segmentation and Depth Estimation with Deep Convolutional Networks , 2016, 2016 Fourth International Conference on 3D Vision (3DV).

[50]  Peter F. Sturm,et al.  Using geometric constraints through parallelepipeds for calibration and 3D modeling , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[51]  Ian D. Reid,et al.  RefineNet: Multi-path Refinement Networks for High-Resolution Semantic Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[52]  Jianbo Shi,et al.  Semantic Segmentation with Boundary Neural Fields , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[53]  Sylvain Lefebvre,et al.  Bridging the gap , 2014, ACM Trans. Graph..

[54]  Jian Sun,et al.  Instance-Aware Semantic Segmentation via Multi-task Network Cascades , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[55]  Ching Y. Suen,et al.  Thinning Methodologies - A Comprehensive Survey , 1992, IEEE Trans. Pattern Anal. Mach. Intell..

[56]  Ravin Balakrishnan,et al.  ILoveSketch: as-natural-as-possible sketching system for creating 3d curve models , 2008, UIST '08.

[57]  Ashutosh Saxena,et al.  Make3D: Depth Perception from a Single Still Image , 2008, AAAI.

[58]  Rob Fergus,et al.  Depth Map Prediction from a Single Image using a Multi-Scale Deep Network , 2014, NIPS.

[59]  Alexei A. Efros,et al.  Seeing 3D Chairs: Exemplar Part-Based 2D-3D Alignment Using a Large Dataset of CAD Models , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[60]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[61]  Vladlen Koltun,et al.  Single-view reconstruction via joint analysis of image and shape collections , 2015, ACM Trans. Graph..

[62]  Sven J. Dickinson,et al.  Contour Grouping and Abstraction Using Simple Part Models , 2010, ECCV.