论文信息 - A Teacher-Student Learning Based Born-Again Training Approach to Improving Scene Text Detection Accuracy

A Teacher-Student Learning Based Born-Again Training Approach to Improving Scene Text Detection Accuracy

With the recent success of convolutional neural network (CNN) based text detection approaches, designing better CNN-based text detection frameworks has become a major research focus to improve text detection accuracy. In this paper, instead of following this direction, we propose to use a born-again training strategy, which is based on teacher-student learning (TSL), to improve the accuracy of the state-of-the-art CNN-based text detectors. More specifically, given a well-trained CNN-based text detector, we take it as a teacher model and train from scratch a new student model with the same topology under the supervision of both the teacher model and ground-truth labels. Furthermore, we propose a new proposal-free multi-level feature mimicking approach to making multi-level convolutional feature maps be effectively mimicked in a unified manner. Experiments demonstrate that the student models trained by the proposed approach can achieve substantially better results than their teacher models and have better generalization abilities.

Lei Sun | Zhuoyao Zhong | Qiang Huo

[1] Kaiming He,et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2] Shuchang Zhou,et al. Scene Text Detection via Holistic, Multi-Channel Prediction , 2016, ArXiv.

[3] Yue Wu,et al. Self-Organized Text Detection with Minimal Post-processing via Border Learning , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[4] Yoshua Bengio,et al. FitNets: Hints for Thin Deep Nets , 2014, ICLR.

[5] Geoffrey E. Hinton,et al. Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[6] Lei Sun,et al. An anchor-free region proposal network for Faster R-CNN-based text detection approaches , 2018, International Journal on Document Analysis and Recognition (IJDAR).

[7] Rich Caruana,et al. Do Deep Nets Really Need to be Deep? , 2013, NIPS.

[8] Kaiming He,et al. Feature Pyramid Networks for Object Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9] Fei Yin,et al. Deep Direct Regression for Multi-oriented Scene Text Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[10] Zachary Chase Lipton,et al. Born Again Neural Networks , 2018, ICML.

[11] Wenyu Liu,et al. TextBoxes: A Fast Text Detector with a Single Deep Neural Network , 2016, AAAI.

[12] Ali Farhadi,et al. You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13] Shuchang Zhou,et al. EAST: An Efficient and Accurate Scene Text Detector , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14] Lianwen Jin,et al. Deep Matching Prior Network: Toward Tighter Multi-oriented Text Detection , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15] Trevor Darrell,et al. Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16] Junjie Yan,et al. Mimicking Very Efficient Network for Object Detection , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17] Xiang Bai,et al. Detecting Oriented Text in Natural Images by Linking Segments , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[19] Xiangyang Xue,et al. Arbitrary-Oriented Scene Text Detection via Rotation Proposals , 2017, IEEE Transactions on Multimedia.

[20] Yi Yang,et al. DenseBox: Unifying Landmark Localization with End to End Object Detection , 2015, ArXiv.

[21] Ankush Gupta,et al. Synthetic Data for Text Localisation in Natural Images , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22] Trevor Darrell,et al. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[23] Xin He,et al. TextSnake: A Flexible Representation for Detecting Text of Arbitrary Shapes , 2018, ECCV.

[24] Ross B. Girshick,et al. Mask R-CNN , 2017, 1703.06870.

[25] Wei Liu,et al. SSD: Single Shot MultiBox Detector , 2015, ECCV.

[26] Jon Almazán,et al. ICDAR 2013 Robust Reading Competition , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[27] Andrew Zisserman,et al. Reading Text in the Wild with Convolutional Neural Networks , 2014, International Journal of Computer Vision.

[28] Wafa Khlif,et al. ICDAR2017 Robust Reading Challenge on Multi-Lingual Scene Text Detection and Script Identification - RRC-MLT , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[29] Wei Zhou,et al. TextField: Learning a Deep Direction Field for Irregular Scene Text Detection , 2018, IEEE Transactions on Image Processing.

[30] Tony X. Han,et al. Learning Efficient Object Detection Models with Knowledge Distillation , 2017, NIPS.

[31] Lianwen Jin,et al. DeepText: A new approach for text proposal generation and text detection in natural images , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[32] Ross B. Girshick,et al. Fast R-CNN , 2015, 1504.08083.