Enhanced Facade Parsing for Street-Level Images Using Convolutional Neural Networks

Façade parsing is an essential process before the 3-D modeling of digital or virtual 3-D city models. The existing grammar-based approaches for façade parsing rely on strong prior knowledge but can obtain façade parts with better structure. Pixelwise-segmentation-based approaches achieve façade parsing with much less knowledge but the resulting structure of façade parts is normally incomplete. Both these approaches are restricted by their high reliance on the data set. Therefore, they cannot be applied for façade parsing with complex scenes. To address this issue, we built a large street-level data set by taking Mapillary images as the training data for more general scenes. At the same time, we propose a new pipeline based on convolutional neural network (CNN) that combines pixelwise segmentation and global object detection to achieve better results for facade parsing. Our pipeline can be applied to façade images after rectification and street-level façade images with complex scenes. The result of the ablation study demonstrates that the design of our pipeline is effective. We test our pipeline on the classic ECP2011 data set and our new large street-level data set. Our pipeline achieves state-of-the-art results for both the data sets: an accuracy of 98.2% and the mean average precision (mAP) of 98.8% on the ECP2011 data set as well as the mAP of 81.1% for façade parts parsing on our street-level data set.