Local Gradient Difference Features for Classification of 2D-3D Natural Scene Text Images

Methods developed for normal 2D text detection do not work well for text that is rendered using decorative, 3D effects, etc. This paper proposes a new method for classification of 2D and 3D natural scene text images so that an appropriate recognition method can be chosen accordingly based on the classification results for better performance. The proposed method explores local gradient differences for obtaining candidate pixels, which represent a stroke. To study the spatial distribution of candidate pixels, we propose a measure, called COLD, which is denser for pixels toward the center of strokes and scattered for non-stroke pixels. This observation leads us to introduce mass features for extracting the regular spatial pattern of COLD, which indicates a 2D text image. The extracted features are fed into a Neural Network (NN) for classification. The proposed method is tested on (i) a new dataset introduced in this work (ii) a second dataset assembled from standard natural scene datasets (iii) Non-Text Image datasets which does not contain text, rather it contains objects. Experimental results of the proposed method on images with text and non-text show that the proposed method is independent of text. The proposed approach improves text detection and recognition performance significantly after classification.

[1]  Palaiahnakote Shivakumara,et al.  A new multi-modal approach to bib number/text detection and recognition in Marathon images , 2017, Pattern Recognit..

[2]  Wei Zhou,et al.  TextField: Learning a Deep Direction Field for Irregular Scene Text Detection , 2018, IEEE Transactions on Image Processing.

[3]  Sebastian Ruder,et al.  An overview of gradient descent optimization algorithms , 2016, Vestnik komp'iuternykh i informatsionnykh tekhnologii.

[4]  Junjie Yan,et al.  FOTS: Fast Oriented Text Spotting with a Unified Network , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[5]  Wencan Zhong,et al.  A New Shadow Detection and Depth Removal Method for 3D Text Recognition in Scene Images , 2018, CSAI '18.

[6]  Jiang Xie,et al.  A local-gravitation-based method for the detection of outliers and boundary points , 2020, Knowl. Based Syst..

[7]  Tong Lu,et al.  Delaunay triangulation based text detection from multi-view images of natural scene , 2020, Pattern Recognit. Lett..

[8]  George E. Nasr,et al.  Cross Entropy Error Function in Neural Networks: Forecasting Gasoline Demand , 2002, FLAIRS.

[9]  Palaiahnakote Shivakumara,et al.  Multi-Script-Oriented Text Detection and Recognition in Video/Scene/Born Digital Images , 2019, IEEE Transactions on Circuits and Systems for Video Technology.

[10]  Jean-Marc Odobez,et al.  Multi-scale sequential network for semantic text segmentation and localization , 2020, Pattern Recognit. Lett..

[11]  Xiang Bai,et al.  ASTER: An Attentional Scene Text Recognizer with Flexible Rectification , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Zhi Tang,et al.  A quadrilateral scene text detector with two-stage network architecture , 2020, Pattern Recognit..

[13]  Sridhar Narayan,et al.  The Generalized Sigmoid Activation Function: Competetive Supervised Learning , 1997, Inf. Sci..

[14]  Lambert Schomaker,et al.  Beyond OCR: Multi-faceted understanding of handwritten document characteristics , 2017, Pattern Recognit..

[15]  Kai Chen,et al.  Real-time Scene Text Detection with Differentiable Binarization , 2019, AAAI.

[16]  Kai Ming Ting,et al.  Mass estimation , 2012, Machine Learning.

[17]  Xiang Li,et al.  Shape Robust Text Detection With Progressive Scale Expansion Network , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Tong Lu,et al.  Curved text detection in blurred/non-blurred video/scene images , 2019, Multimedia Tools and Applications.

[19]  Palaiahnakote Shivakumara,et al.  A new method for multi-oriented graphics-scene-3D text classification in video , 2016, Pattern Recognit..

[20]  Ya Su,et al.  A Unified Framework for Tracking Based Text Detection and Recognition from Web Videos , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  Mark Sandler,et al.  MobileNetV2: Inverted Residuals and Linear Bottlenecks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[22]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.