Fast RT-LoG operator for scene text detection

This paper proposes a new real-time Laplacian of Gaussian (RT-LoG) operator for scene text detection. This method takes advantage of the Gaussian kernel distribution in the spatial/scale-space domains and kernel decomposition with the box filtering method. Two levels of optimization are given. The first level of optimization within the spatial domain is obtained by box mutualization. The second level of optimization within the spatial/scale-space domains is performed using a mixed method for box selection. The proposed RT-LoG operator is evaluated on the ICDAR2017 RRC-MLT dataset in terms of robustness and time processing. The results are compared with the state-of-the-art real-time operators for scene text detection. The proposed operator appears as the top performance with the best trade-off between robustness and time processing. The proposed operator can support approximately 30 frames per second (FPS) up to the Quad-HD resolution on a regular CPU architecture with a low-level latency. In addition, the proposed operator can support the full pipeline for scene text detection. Our system is competitive with the top accurate systems of the literature while processing with a difference of two orders of magnitude in term of processing resources.

[1]  Xin He,et al.  Scene Text Detection and Recognition: The Deep Learning Era , 2018, International Journal of Computer Vision.

[2]  G. Strang Introduction to Linear Algebra , 1993 .

[3]  Zhu Li,et al.  Cascade of Box (CABOX) Filters for Optimal Scale Space Approximation , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[4]  Dimosthenis Karatzas,et al.  MSER-Based Real-Time Text Detection and Tracking , 2014, 2014 22nd International Conference on Pattern Recognition.

[5]  Jun Zhang,et al.  Multi-Orientation Scene Text Detection with Adaptive Clustering , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Carme Julià,et al.  Real-Time Text Localization in Natural Scene Images Using a Linear Spatial Filter , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[7]  Xiang Bai,et al.  Mask TextSpotter: An End-to-End Trainable Neural Network for Spotting Text with Arbitrary Shapes , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Yonatan Wexler,et al.  Detecting text in natural scenes with stroke width transform , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[9]  Fei Yin,et al.  Multi-Oriented and Multi-Lingual Scene Text Detection With Direct Regression , 2018, IEEE Transactions on Image Processing.

[10]  Jiri Matas,et al.  Real-Time Lexicon-Free Scene Text Localization and Recognition , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Jie Sheng,et al.  Pyramid Mask Text Detector , 2019, ArXiv.

[12]  Mathieu Delalandre,et al.  Fast Scene Text Detection with RT-LoG Operator and CNN , 2020, VISIGRAPP.

[13]  David S. Doermann,et al.  Text Detection and Recognition in Imagery: A Survey , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Jiri Matas,et al.  FASText: Efficient Unconstrained Scene Text Detector , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[15]  Giorgio Buttazzo,et al.  Hard Real-Time Computing Systems: Predictable Scheduling Algorithms and Applications , 1997 .

[16]  G. Manduchi,et al.  Assessment of General Purpose GPU Systems in Real-Time Control , 2017, IEEE Transactions on Nuclear Science.

[17]  Shuai Shao,et al.  Scene text detection based on enhanced multi-channels MSER and a fast text grouping process , 2018, 2018 IEEE 3rd International Conference on Cloud Computing and Big Data Analysis (ICCCBDA).

[18]  T. Lindeberg Scale-Space Theory : A Basic Tool for Analysing Structures at Different Scales , 1994 .

[19]  Jean-Michel Morel,et al.  An analysis of scale-space sampling in SIFT , 2014, 2014 IEEE International Conference on Image Processing (ICIP).

[20]  Ernest Valveny,et al.  ICDAR 2015 competition on Robust Reading , 2015, 2015 13th International Conference on Document Analysis and Recognition (ICDAR).

[21]  Yongdong Zhang,et al.  Real-Time Scene Text Detection Based on Stroke Model , 2014, 2014 22nd International Conference on Pattern Recognition.

[22]  Qi Tian,et al.  Scale based region growing for scene text detection , 2013, ACM Multimedia.

[23]  Jaejin Lee,et al.  Performance analysis of CNN frameworks for GPUs , 2017, 2017 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).

[24]  Dimitrios Charalampidis,et al.  Recursive Implementation of the Gaussian Filter Using Truncated Cosine Functions , 2016, IEEE Transactions on Signal Processing.

[25]  Wayne Luk,et al.  Optimizing CNN-Based Object Detection Algorithms on Embedded FPGA Platforms , 2017, ARC.

[26]  Dacheng Tao,et al.  Geometry-Aware Scene Text Detection with Instance Transformation Network , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[27]  Xiaolin Li,et al.  Single Shot Text Detector with Regional Attention , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[28]  Giorgio C. Buttazzo,et al.  HARD REAL-TIME COMPUTING SYSTEMS Predictable Scheduling Algorithms and Applications , 2007 .

[29]  Michael Werman,et al.  Efficient and accurate Gaussian image filtering using running sums , 2011, 2012 12th International Conference on Intelligent Systems Design and Applications (ISDA).

[30]  Lei Sun,et al.  Mask R-CNN With Pyramid Attention Network for Scene Text Detection , 2018, 2019 IEEE Winter Conference on Applications of Computer Vision (WACV).

[31]  Siddhesh Khandelwal,et al.  Faster K-Means Cluster Estimation , 2017, ECIR.

[32]  Shuchang Zhou,et al.  EAST: An Efficient and Accurate Scene Text Detector , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Lei Sun,et al.  An anchor-free region proposal network for Faster R-CNN-based text detection approaches , 2018, International Journal on Document Analysis and Recognition (IJDAR).

[34]  Shuicheng Yan,et al.  Multi-oriented Scene Text Detection via Corner Localization and Region Segmentation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[35]  Wei Zhu,et al.  Scene text detection via extremal region based double threshold convolutional network classification , 2017, PloS one.

[36]  Errui Ding,et al.  Look More Than Once: An Accurate Detector for Text of Arbitrary Shapes , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Paul A. Viola,et al.  Robust Real-Time Face Detection , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[38]  Yingli Tian,et al.  Towards Accurate Instance-Level Text Spotting with Guided Attention , 2019, 2019 IEEE International Conference on Multimedia and Expo (ICME).

[39]  Zhenwei Miao,et al.  Contrast Invariant Interest Point Detection by Zero-Norm LoG Filter. , 2016, IEEE transactions on image processing : a publication of the IEEE Signal Processing Society.

[40]  Wafa Khlif,et al.  ICDAR2017 Robust Reading Challenge on Multi-Lingual Scene Text Detection and Script Identification - RRC-MLT , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[41]  Hojin Cho,et al.  Canny Text Detector: Fast and Robust Scene Text Localization Algorithm , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Adel M. Alimi,et al.  Text Detection Based on MSER and CNN Features , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[43]  Mathieu Delalandre,et al.  Performance Evaluation of Real-time and Scale-invariant LoG Operators for Text Detection , 2019, VISIGRAPP.

[44]  Junjie Yan,et al.  FOTS: Fast Oriented Text Spotting with a Unified Network , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[45]  Gérard G. Medioni,et al.  Text segmentation in color images using tensor voting , 2007, Image Vis. Comput..