Faster BiSeNet: A Faster Bilateral Segmentation Network for Real-time Semantic Segmentation

Since the rising demand for dense estimation tasks in mobile applications, real-time semantic segmentation is increasingly desirable. Our work proposes a faster bilateral segmentation network (Faster BiSeNet) based on BiSeNetV2, which promotes feature fusion of the spatial and semantic branch with the more compact structure to improve real-time performance. Our light-weight design enhances the mutual connection between two branches in the shallow and deep layers by 1) Shallow feature sharing: using the simple linear operation to transform shallow features of the spatial branch as those needed in the semantic branch; 2) Deep feature aggregation: introducing a Gated Guided Aggregation Layer to guide appropriate spatial information to supplement the missing details of the semantic branch through gating mechanism, which avoids using unnecessary convolutions to select the aggregated features; 3) Auxiliary edge loss: making the gated output of the spatial branch focus on important boundary-related information, which facilitates the fusion of valuable edge information into the predicted results. Extensive experiments demonstrate that our approach obtains better realtime performance against a few state-of-the-art real-time semantic segmentation methods. Compared with BiSeNetV2, Faster BiSeNet respectively achieves 72.8% $(\uparrow 0.2\%)$ I, 74.5% $(\uparrow 2.1\%)$) Mean IoU at the inference of 187 FPS $(\uparrow 31$ FPS), 154 FPS $(\uparrow 29.5$ FPS) on the challenging Cityscapes test set and CamVid validation set on one NVIDIA GeForce GTX 1080Ti card. This demonstrates that our method achieves the competitive segmentation performance at real-time speed.