Text/Non-Text Separation from Handwritten Document Images Using LBP Based Features: An Empirical Study

Isolating non-text components from the text components present in handwritten document images is an important but less explored research area. Addressing this issue, in this paper, we have presented an empirical study on the applicability of various Local Binary Pattern (LBP) based texture features for this problem. This paper also proposes a minor modification in one of the variants of the LBP operator to achieve better performance in the text/non-text classification problem. The feature descriptors are then evaluated on a database, made up of images from 104 handwritten laboratory copies and class notes of various engineering and science branches, using five well-known classifiers. Classification results reflect the effectiveness of LBP-based feature descriptors in text/non-text separation.

[1]  Sargur N. Srihari,et al.  Classification of newspaper image blocks using texture analysis , 1989, Comput. Vis. Graph. Image Process..

[2]  K. C. Santosh Complex and Composite Graphical Symbol Recognition and Retrieval: A Quick Review , 2016, RTIP2R.

[3]  Frank Y. Shih,et al.  Adaptive document block segmentation and classification , 1996, IEEE Trans. Syst. Man Cybern. Part B.

[4]  Soo-Hyung Kim,et al.  Separation of Text and Non-text in Document Layout Analysis using a Recursive Filter , 2015, KSII Trans. Internet Inf. Syst..

[5]  K. C. Santosh,et al.  g-DICE: graph mining-based document information content exploitation , 2015, International Journal on Document Analysis and Recognition (IJDAR).

[6]  Adnan Khashman,et al.  Document segmentation using textural features summarization and feedforward neural network , 2015, Applied Intelligence.

[7]  Subhadip Basu,et al.  Suppression of non-text components in handwritten document images , 2011, 2011 International Conference on Image Information Processing.

[8]  Nibaran Das,et al.  Automatic Indic script identification from handwritten documents: page, block, line and word-level approach , 2019, Int. J. Mach. Learn. Cybern..

[9]  Loris Nanni,et al.  Survey on LBP based texture descriptors for image classification , 2012, Expert Syst. Appl..

[10]  Hwan-chul Park,et al.  Word Extraction in Text/Graphic Mixed Image Using 3-Dimensional Graph Model , 2001 .

[11]  Mita Nasipuri,et al.  Text and Non-text Separation in Handwritten Document Images Using Local Binary Pattern Operator , 2017 .

[12]  Matti Pietikäinen,et al.  Multiresolution Gray-Scale and Rotation Invariant Texture Classification with Local Binary Patterns , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[13]  Matti Pietikäinen,et al.  Texture classification by center-symmetric auto-correlation, using Kullback discrimination of distributions , 1995, Pattern Recognit. Lett..

[14]  Abdel Belaïd,et al.  Handwritten and Printed Text Separation in Real Document , 2013, MVA.

[15]  Tim Ritchings,et al.  Representation and classification of complex-shaped printed regions using white tiles , 1995, Proceedings of 3rd International Conference on Document Analysis and Recognition.

[16]  Marko Heikkilä,et al.  A texture-based method for modeling the background and detecting moving objects , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Angelika Garz,et al.  Using local features for efficient layout analysis of ancient manuscripts , 2011, 2011 19th European Signal Processing Conference.

[18]  I. V. Safonov,et al.  Algorithm for segmentation of documents based on texture features , 2013, Pattern Recognition and Image Analysis.

[19]  Muriel Visani,et al.  Text and non-text segmentation based on connected component features , 2015, 2015 13th International Conference on Document Analysis and Recognition (ICDAR).

[20]  Showmik Bhowmik,et al.  Text and non-text recognition using modified HOG descriptor , 2017, 2017 IEEE Calcutta Conference (CALCON).

[21]  Angelika Garz,et al.  Layout Analysis for Historical Manuscripts Using Sift Features , 2011, 2011 International Conference on Document Analysis and Recognition.

[22]  Hanqing Lu,et al.  Face detection using improved LBP under Bayesian framework , 2004, Third International Conference on Image and Graphics (ICIG'04).

[23]  Matti Pietikäinen,et al.  A comparative study of texture measures with classification based on featured distributions , 1996, Pattern Recognit..

[24]  Ying Yang,et al.  Automatic Single Page-Based Algorithms for Medieval Manuscript Analysis , 2017, JOCCH.

[25]  Bart Lamiroy,et al.  Relative Positioning of Stroke-Based Clustering: a New Approach to Online Handwritten Devanagari Character Recognition , 2012, Int. J. Image Graph..

[26]  Showmik Bhowmik,et al.  An Adaptive Foreground-Background Separation Method for Effective Binarization of Document Images , 2016, SoCPaR.

[27]  Laurent Wendling,et al.  Character recognition based on non-linear multi-projection profiles measure , 2015, Frontiers of Computer Science.

[28]  K. C. Santosh,et al.  Stroke-Based Cursive Character Recognition , 2012, ArXiv.

[29]  Ying Yang,et al.  ATHENA: Automatic Text Height Extraction for the Analysis of Text Lines in Old Handwritten Manuscripts , 2015, ACM Journal on Computing and Cultural Heritage.