CSENet: Cascade Semantic Erasing Network for Weakly-Supervised Semantic Segmentation

Abstract Weakly-supervised semantic segmentation based on image-level annotations has difficulty exploring pixel-level information. Most approaches adopt Class Activation Maps (CAM) to localize initial object regions, called seeds. To cover more potential object parts, seeds-expansion methods raise concern for artificial mask generation. Due to the seeds simply focus on discriminative regions, it is a challenge to spread seeds to the integral object. To tackle this problem, we propose a Cascade Semantic Erasing Network (CSENet) to expand seeds effectively and reasonably. In particular, CSENet sequentially stacks the semantic erasing stage to erase discriminative areas progressively. It forces the network to exploit relevant feature response for non-discriminative object districts. Moreover, CSENet directly suppresses seeds on the Class Activation Maps (CAM), which have stronger semantics, rather than on the Intermediate Feature Maps (IFM). Under semantic guidance, proposed erasing strategy correctly spreads seeds regions to the intra-class regions and meanwhile, prohibits from extending to the unexpected inter-class areas. Extensive experiments demonstrate the effectiveness of proposed CSENet. More specifically, our approach achieves 62.3% and 63.4% mIoU on PASCAL VOC 2012 validation and test set, respectively.