HEAT: Holistic Edge Attention Transformer for Structured Reconstruction

This paper presents a novel attention-based neural network for structured reconstruction, which takes a 2D raster image as an input and reconstructs a planar graph depicting an underlying geometric structure. The approach detects corners and classifies edge candidates between corners in an end-to-end manner. Our contribution is a holistic edge classification architecture, which 1) initializes the feature of an edge candidate by a trigonometric positional encoding of its end-points; 2) fuses image feature to each edge candidate by deformable attention; 3) employs two weight-sharing Transformer decoders to learn holistic structural patterns over the graph edge candidates; and 4) is trained with a masked learning strategy. The corner detector is a variant of the edge classification architecture, adapted to operate on pixels as corner candidates. We conduct experiments on two structured reconstruction tasks: outdoor building architecture and indoor floorplan planar graph reconstruction. Extensive qualitative and quantitative evaluations demonstrate the superiority of our approach over the state of the art. We will share code and models.

[1]  Yasutaka Furukawa,et al.  Structured Outdoor Architecture Reconstruction by Exploration and Classification , 2021, ArXiv.

[2]  Jaime López-Krahe,et al.  A system to understand hand-drawn floor plans using subgraph isomorphism and Hough transform , 1997, Machine Vision and Applications.

[3]  Jean-Laurent Hippolyte,et al.  Review: reconstruction of 3D building information models from 2D scanned plans , 2015 .

[4]  Shenghua Gao,et al.  PPGNet: Learning Point-Pair Graph for Line Segment Detection , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Yasutaka Furukawa,et al.  Vectorizing World Buildings: Planar Graph Reconstruction by Primitive Detection and Relationship Inference , 2020, ECCV.

[6]  Yasutaka Furukawa,et al.  Learning Pairwise Inter-plane Relations for Piecewise Planar Reconstruction , 2020, ECCV.

[7]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Ross B. Girshick,et al.  Mask R-CNN , 2017, 1703.06870.

[9]  Honglak Lee,et al.  A Dynamic Bayesian Network Model for Autonomous 3D Reconstruction from a Single Indoor Image , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[10]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[11]  Richard Szeliski,et al.  Manhattan-world stereo , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Bin Li,et al.  Deformable DETR: Deformable Transformers for End-to-End Object Detection , 2020, ICLR.

[13]  Vincent Lepetit,et al.  MonteFloor: Extending MCTS for Reconstructing Accurate Large-Scale Floor Plans , 2021, ArXiv.

[14]  Zhuowen Tu,et al.  Line Segment Detection Using Transformers without Edges , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Jiajun Wu,et al.  Raster-to-Vector: Revisiting Floorplan Transformation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[16]  Derek Hoiem,et al.  Recovering the spatial layout of cluttered rooms , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[17]  Zihan Zhou,et al.  Structured3D: A Large Photo-realistic Dataset for Structured 3D Modeling , 2019, ECCV.

[18]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Antonio Adán,et al.  3D Reconstruction of Interior Wall Surfaces under Occlusion and Clutter , 2011, 2011 International Conference on 3D Imaging, Modeling, Processing, Visualization and Transmission.

[20]  Qiming Qin,et al.  Accurate Outline Extraction of Individual Building From Very High-Resolution Optical Images , 2018, IEEE Geoscience and Remote Sensing Letters.

[21]  Gui-Song Xia,et al.  Holistically-Attracted Wireframe Parsing , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Yi Ma,et al.  End-to-End Wireframe Parsing , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[23]  Nicolas Usunier,et al.  End-to-End Object Detection with Transformers , 2020, ECCV.

[24]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[25]  Jiacheng Chen,et al.  Floor-SP: Inverse CAD for Floorplans by Sequential Room-Wise Shortest Path , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[26]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[27]  Derek Hoiem,et al.  LayoutNet: Reconstructing the 3D Room Layout from a Single RGB Image , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[28]  Adam Van Etten,et al.  SpaceNet: A Remote Sensing Dataset and Challenge Series , 2018, ArXiv.

[29]  Derek Hoiem,et al.  Indoor Segmentation and Support Inference from RGBD Images , 2012, ECCV.

[30]  Ricardo Cabral,et al.  Piecewise Planar and Compact Floorplan Reconstruction from Images , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.