Noname manuscript No. (will be inserted by the editor) A General Approach for Multi-oriented Text Line Extraction of Handwritten Documents

The multi-orientation occurs frequently in ancient handwritten documents, where the writers try to update a document by adding some annotations in the margins. Due to the margin narrowness, this gives rise to lines in different directions and orientations. Document recognition needs to find the lines everywhere they are written whatever their orientation. This is why we propose in this paper a new approach allowing us to extract the multi-oriented lines in scanned documents. Because of the multi-orientation of lines and their dispersion in the page, we use an image meshing allowing us to progressively and locally determine the lines. Once the meshing is established, the orientation is determined using the Wigner–Ville distribution on the projection histogram profile. This local orientation is then enlarged to limit the orientation in the neighborhood. Afterward, the text lines are extracted locally in each zone basing on the follow-up of the orientation lines and the proximity of connected components. Finally, the connected components that overlap and touch in adjacent lines are separated. The morphology analysis of the terminal letters of Arabic words is here considered. The proposed approach has been experimented on 100 documents reaching an accuracy of about 98.6%.

[1]  Tieniu Tan,et al.  A general algorithm for document skew angle estimation , 1997, Proceedings of International Conference on Image Processing.

[2]  Philippe Cinquin,et al.  From Splines and Snakes to SNAKE SPLINES , 1991, Geometric Reasoning for Perception and Action.

[3]  Syed Saqib Bukhari,et al.  Segmentation of Curled Textlines Using Active Contours , 2008, 2008 The Eighth IAPR International Workshop on Document Analysis Systems.

[4]  E. Wigner On the quantum correction for thermodynamic equilibrium , 1932 .

[5]  Syed Saqib Bukhari,et al.  Performance evaluation of curled textline segmentation algorithms on CBDAR 2007 dewarping contest dataset , 2010, 2010 IEEE International Conference on Image Processing.

[6]  Fatos T. Yarman-Vural,et al.  Repulsive attractive network for baseline extraction on document images , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[7]  David S. Doermann,et al.  A model-based line detection algorithm in documents , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[8]  R. Schafer,et al.  On the use of the I 0 -sinh window for spectrum analysis , 1980 .

[9]  José M. F. Moura,et al.  STACS: new active contour scheme for cardiac MR image segmentation , 2005, IEEE Transactions on Medical Imaging.

[10]  Nikos Fakotakis,et al.  Skew angle estimation in document processing using Cohen's class distributions , 1999, Pattern Recognit. Lett..

[11]  S. Osher,et al.  Geometric Level Set Methods in Imaging, Vision, and Graphics , 2011, Springer New York.

[12]  Subhadip Basu,et al.  A Hough Transform based Technique for Text Segmentation , 2010, ArXiv.

[13]  Adel M. Alimi,et al.  Unsupervised Block Covering Analysis for Text-Line Segmentation of Arabic Ancient Handwritten Document Images , 2010, 2010 20th International Conference on Pattern Recognition.

[14]  Johan Montagnat,et al.  A review of deformable surfaces: topology, geometry and deformation , 2001, Image Vis. Comput..

[15]  J. Sethian Curvature and the evolution of fronts , 1985 .

[16]  Norihiro Hagita,et al.  Automated entry system for printed documents , 1990, Pattern Recognit..

[17]  Abder Zahour,et al.  Contribution à la segmentation de textes manuscrits anciens , 2004 .

[18]  D. Mumford,et al.  Optimal approximations by piecewise smooth functions and associated variational problems , 1989 .

[19]  Basilios Gatos,et al.  Handwritten Text Line Segmentation by Shredding Text into its Lines , 2009, 2009 10th International Conference on Document Analysis and Recognition.

[20]  Thierry Paquet,et al.  Text line segmentation in handwritten document using a production system , 2004, Ninth International Workshop on Frontiers in Handwriting Recognition.

[21]  Zhixin Shi,et al.  A natural learning algorithm based on Hough transform for text lines extraction in handwritten documents , 1999 .

[22]  Nikos Fakotakis,et al.  Skew angle estimation for printed and handwritten documents using the Wigner-Ville distribution , 2002, Image Vis. Comput..

[23]  Yi Li,et al.  Script-Independent Text Line Segmentation in Freestyle Handwritten Documents , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Auger,et al.  1 - Quelques commentaires sur des représentations temps-fréquence proposées récemment , 1992 .

[25]  Jiangying Zhou,et al.  Page segmentation and classification , 1992, CVGIP Graph. Model. Image Process..

[26]  Venu Govindaraju,et al.  Line separation for complex document images using fuzzy runlength , 2004, First International Workshop on Document Image Analysis for Libraries, 2004. Proceedings..

[27]  Tien D. Bui,et al.  Text line segmentation in handwritten documents using Mumford-Shah model , 2009, Pattern Recognit..

[28]  Klaus D. Tönnies,et al.  Line detection and segmentation in historical church registers , 2001, Proceedings of Sixth International Conference on Document Analysis and Recognition.

[29]  Ronny Ramlau,et al.  A Mumford-Shah level-set approach for the inversion and segmentation of X-ray tomography data , 2007, J. Comput. Phys..

[30]  Jerry L. Prince,et al.  Gradient vector flow: a new external force for snakes , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[31]  Frank Lebourgeois,et al.  Networking digital document images , 2001, Proceedings of Sixth International Conference on Document Analysis and Recognition.

[32]  Guillermo Sapiro,et al.  Geodesic Active Contours , 1995, International Journal of Computer Vision.

[33]  Uma Mahadevan,et al.  Gap metrics for word separation in handwritten lines , 1995, Proceedings of 3rd International Conference on Document Analysis and Recognition.

[34]  Demetri Terzopoulos,et al.  Snakes: Active contour models , 2004, International Journal of Computer Vision.

[35]  Ioannis Pratikakis,et al.  Text line and word segmentation of handwritten documents , 2009, Pattern Recognit..

[36]  Apostolos Antonacopoulos,et al.  Document image analysis for World War II personal records , 2004, First International Workshop on Document Image Analysis for Libraries, 2004. Proceedings..

[37]  Jayant Kumar,et al.  Handwritten Arabic text line segmentation using affinity propagation , 2010, DAS '10.

[38]  Patrick Flandrin,et al.  Time-Frequency/Time-Scale Analysis , 1998 .

[39]  F. Hlawatsch,et al.  Linear and quadratic time-frequency signal representations , 1992, IEEE Signal Processing Magazine.

[40]  Laurence Likforman-Sulem,et al.  A Hough based algorithm for extracting text lines in handwritten documents , 1995, Proceedings of 3rd International Conference on Document Analysis and Recognition.

[41]  Georgi Gluhchev,et al.  Handwritten document image segmentation and analysis , 1993, Pattern Recognit. Lett..

[42]  L. Cohen Generalized Phase-Space Distribution Functions , 1966 .