Boundary-aware Arbitrary-shaped Scene Text Detector with Learnable Embedding Network

Benefiting from the popularity of deep learning theory, scene text detection algorithms have developed rapidly in recent years. Methods representing text region by text segmentation map are proved to capture arbitrary-shaped text in a more flexible and accurate way. However, such segmentation-based methods are prone to be disturbed by the text-like background patterns (like the fence, grass, etc.), which generally suffer from imprecise boundary detail problem. In this paper, LEMNet is proposed to handle the imprecise boundary problem by guiding the generation of text boundary based on a priori constraint. In the training stage, Boundary Segmentation Branch is firstly constructed to predict coarse boundary mask for each text instance. Then, through mapping pixels into an embedding space, the proposed Pixel Embedding Branch makes the embedding representation of boundary points learn to be more similar, meanwhile enlarging the characteristic distance between background points and boundary points. During inference, noise in the coarse boundary segmentation map can be effectively suppressed by a Noisy Point Suppression Algorithm among pixel embedding vectors. In this way, LEMNet can generate a more precise boundary description of text regions. To further enhance the distinguishability of boundary features, we propose a Context Enhancement Module to capture feature interactions in different representation subspaces, in which features are parallelly performed attention and concatenated to generate enhanced features. Extensive experiments are conducted over four challenging datasets, which demonstrate the effectiveness of LEMNet. Specifically, LEMNet achieves F-measure of 85.2%, 87.6% and 85.2% on CTW1500, Total-Text and MSRA-TD500 respectively, which is the latest SOTA.