Novel Approach for Baseline Detection and Text Line Segmentation

Baseline detection and line segmentation are essential preprocessing steps of any OCR system. In this paper we have proposed a robust and fast method for base lines detection based on projected pattern analysis of Radon Transform. The algorithm have been tested on more than 350 samples including both printed and handwriting of Persian/Arabic, English and also multilingual documents. Obtained results indicate that in spite of narrow interline spaces and noisy components our method is capable to extract baseline in documents precisely. In addition, in the case of multi-frequencies pattern, it has been shown that proposed method can reach its performance to accurate detection of base lines. General Term Image Processing, Document Analysis. Keyword Optical Character Recognition, Document Analysis, Multilingual Documents, Radon Transform, Neural Networks

[1]  G. Lorette,et al.  Advances in Handwriting and Drawing: a multidisciplinary approach , 1994 .

[2]  Abderrazak Zahour,et al.  Arabic hand-written text-line extraction , 2001, Proceedings of Sixth International Conference on Document Analysis and Recognition.

[3]  Venu Govindaraju,et al.  Fingerprint Image Enhancement Using STFT Analysis , 2005, ICAPR.

[4]  Vivek Singh,et al.  Handwriting Analysis based on Segmentation Method for Prediction of Human Personality using Support Vector Machine , 2010 .

[5]  Hjouj Hjouj,et al.  Identification of Reflected, Scaled, Translated, and Rotated Objects From Their Radon Projections , 2008, IEEE Transactions on Image Processing.

[6]  Alireza Alaei,et al.  A New Text-Line Alignment Approach Based on Piece-Wise Painting Algorithm for Handwritten Documents , 2011, 2011 International Conference on Document Analysis and Recognition.

[7]  Laurence Likforman-Sulem,et al.  A Hough based algorithm for extracting text lines in handwritten documents , 1995, Proceedings of 3rd International Conference on Document Analysis and Recognition.

[8]  Geoffrey E. Hinton,et al.  Phoneme recognition using time-delay neural networks , 1989, IEEE Trans. Acoust. Speech Signal Process..

[9]  Yo-Sung Ho,et al.  Modified Discrete Radon Transforms and Their Application to Rotation-Invariant Image Analysis , 2006, 2006 IEEE Workshop on Multimedia Signal Processing.

[10]  G. Timar,et al.  Analogic preprocessing and segmentation algorithms for off-line handwriting recognition , 2002, Proceedings of the 2002 7th IEEE International Workshop on Cellular Neural Networks and Their Applications.

[11]  David Doermann,et al.  A New Algorithm for Detecting Text Line in Handwritten Documents , 2006 .

[12]  Karim Faez,et al.  A novel two-stage algorithm for baseline estimation and correction in Farsi and Arabic handwritten text line , 2008, 2008 19th International Conference on Pattern Recognition.

[13]  Muhammad Sher,et al.  Locally baseline detection for online Arabic script based languages character recognition , 2010 .

[14]  B. V. Dhandra,et al.  Offline Handwritten Script Identification in Document Images , 2010 .

[15]  Jianwen Luo,et al.  Properties of Savitzky-Golay digital differentiators , 2005, Digit. Signal Process..

[16]  Aasia Quyoum,et al.  Document Image Processing - A Review , 2010 .

[17]  Murray J. J. Holt,et al.  Line extraction and stroke ordering of text pages , 1995, Proceedings of 3rd International Conference on Document Analysis and Recognition.

[18]  P. Nagabhushan,et al.  Tracing and straightening the baseline in handwritten persian/arabic text-line: A new approach based on painting-technique , 2010 .

[19]  Srikanta Pal,et al.  Line and Word Segmentation Approach for Printed Documents , 2010 .

[20]  J. Mohammadi,et al.  Vehicle speed estimation based on the image motion blur using RADON transform , 2010, 2010 2nd International Conference on Signal Processing Systems.

[21]  Hamid Reza Pourreza,et al.  Offline Signature Verification Using Local Radon Transform and Support Vector Machines , 2009 .

[22]  Csaba Rekeczky,et al.  Analogic Preprocessing And Segmentation Algorithms For Offline Handwriting Recognition , 2003, J. Circuits Syst. Comput..

[23]  Amine Nait-Ali,et al.  A Robust Technique to Characterize the Palmprint using Radon Transform and Delaunay Triangulation , 2010 .

[24]  G. Hemantha Kumar,et al.  Graphics Separation and Skew Correction for Mobile Captured Documents and Comparative analysis with Existing Methods , 2010 .