Robust Text Line Segmentation for Historical Manuscript Images Using Color and Texture

In this paper we present a novel text line segmentation method for historical manuscript images. We use a pyramidal approach where at the first level, pixels are classified into: text, background, decoration, and out of page, at the second level, text regions are split into text line and non text line. Color and texture features based on Local Binary Patterns and Gabor Dominant Orientation are used for classification. By applying a modified Fast Correlation-Based Filter feature selection algorithm, redundant and irrelevant features are removed. Finally, the text line segmentation results are refined by a smoothing post-processing procedure. Unlike other projection profile or connected components methods, the proposed algorithm does not use any script-specific knowledge and is applicable to color images. The proposed algorithm is evaluated on three historical manuscript image datasets of diverse nature and achieved an average precision of 91% and recall of 84%. Experiments also show that the proposed algorithm is robust with respect to changes of the writing style, page layout, and noise on the image.

[1]  Huan Liu,et al.  Feature Selection for High-Dimensional Data: A Fast Correlation-Based Filter Solution , 2003, ICML.

[2]  Ioannis Pratikakis,et al.  Text line and word segmentation of handwritten documents , 2009, Pattern Recognit..

[3]  Matti Pietikäinen,et al.  Multiresolution Gray-Scale and Rotation Invariant Texture Classification with Local Binary Patterns , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[4]  Its'hak Dinstein,et al.  2009 10th International Conference on Document Analysis and Recognition Line segmentation for degraded handwritten historical documents , 2022 .

[5]  Frank Lebourgeois,et al.  DEBORA: Digital AccEss to BOoks of the RenAissance , 2006, International Journal of Document Analysis and Recognition (IJDAR).

[6]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[7]  Andreas Keller,et al.  Lexicon-free handwritten word spotting using character HMMs , 2012, Pattern Recognit. Lett..

[8]  Laurence Likforman-Sulem,et al.  Text line segmentation of historical documents: a survey , 2007, International Journal of Document Analysis and Recognition (IJDAR).

[9]  Zehong Yang,et al.  Recognition of gray character using gabor filters , 2002, Proceedings of the Fifth International Conference on Information Fusion. FUSION 2002. (IEEE Cat.No.02EX5997).

[10]  Jean-Yves Ramel,et al.  AGORA: the interactive document image analysis tool of the BVH project , 2006, Second International Conference on Document Image Analysis for Libraries (DIAL'06).

[11]  Mallikarjun Hangarge,et al.  Statistical Texture Features based Handwritten and Printed Text Classification in South Indian Documents , 2013, ArXiv.

[12]  Rolf Ingold,et al.  Multi Resolution Layout Analysis of Medieval Manuscripts Using Dynamic MLP , 2011, 2011 International Conference on Document Analysis and Recognition.

[13]  Rolf Ingold,et al.  Evaluation of SVM, MLP and GMM Classifiers for Layout Analysis of Historical Documents , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[14]  Fei Yin,et al.  Handwritten Chinese text line segmentation by clustering with distance metric learning , 2009, Pattern Recognit..

[15]  Matti Pietikäinen,et al.  A comparative study of texture measures with classification based on featured distributions , 1996, Pattern Recognit..

[16]  Apostolos Antonacopoulos,et al.  Document image analysis for World War II personal records , 2004, First International Workshop on Document Image Analysis for Libraries, 2004. Proceedings..

[17]  Matti Pietikäinen,et al.  Face Description with Local Binary Patterns: Application to Face Recognition , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Yi Li,et al.  Script-Independent Text Line Segmentation in Freestyle Handwritten Documents , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Nong Sang,et al.  Local Binary Pattern histogram based Texton learning for texture classification , 2011, 2011 18th IEEE International Conference on Image Processing.

[20]  Matti Pietikäinen,et al.  Dynamic Texture Recognition Using Local Binary Patterns with an Application to Facial Expressions , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[22]  Masatoshi Ishikawa,et al.  Document Image Retrieval to Support Reading Mokkans , 2008, 2008 The Eighth IAPR International Workshop on Document Analysis Systems.