Robust Segmentation of Unconstrained Online Handwritten Documents

A segmentation algorithm, which can detect different regions of a handwritten document such as text lines, tables and sketches will be extremely useful in a variety of applications such as retrieval, translation and genre classification. However, this task is extremely challenging for handwritten documents, which vary considerably in their structure and content. In this paper, we describe a robust segmentation method to detect the regions in an unstructured on-line handwritten document. We utilize the temporal information in on-line documents along with its spatial layout to improve the segmentation results. The properties of handwritten strokes are computed using a spline-based representation. We compute the most likely segmentation of the handwritten page using a Stochastic Context Free Grammar based parser. The regions considered in this work include paragraphs, text lines, words, and non-text regions.

[1]  Anil K. Jain,et al.  Structure in on-line documents , 2001, Proceedings of Sixth International Conference on Document Analysis and Recognition.

[2]  Y. Hirayama,et al.  A method for table structure analysis using DP matching , 1995, Proceedings of 3rd International Conference on Document Analysis and Recognition.

[3]  Anil K. Jain,et al.  Document Representation and Its Application to Page Decomposition , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[4]  Elisabetta Bruzzone,et al.  An algorithm for extracting cursive text lines , 1999, Proceedings of the Fifth International Conference on Document Analysis and Recognition. ICDAR '99 (Cat. No.PR00318).

[5]  Thomas Zimmerman,et al.  Pen computing: challenges and applications , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[6]  T. Artières,et al.  Poorly Structured Handwritten Documents Segmentation using Continuous Probabilistic Feature Grammars , 2003 .

[7]  Thomas Kieninger,et al.  THE T-RECS APPROACH FOR TABLE STRUCTURE RECOGNITION AND TABLE BORDER DETERMINATION , 1999 .

[8]  Anil K. Jain,et al.  Locating text in complex color images , 1995, Pattern Recognit..

[9]  William Kornfeld,et al.  Automatically locating, extracting and analyzing tabular data , 1998, SIGIR '98.

[10]  Philip A. Chou,et al.  AN ITERATIVE DECODING APPROACH TO DOCUMENT IMAGE ANALYSIS , 1999 .

[11]  David Jones,et al.  Discerning structure from freeform handwritten notes , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[12]  Anil K. Jain,et al.  A robust and fast skew detection algorithm for generic documents , 1996, Pattern Recognit..

[13]  James H. Martin,et al.  Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition , 2000 .

[14]  E. Ratzlaff,et al.  INTER-LINE DISTANCE ESTIMATION AND TEXT LINE EXTRACTION FOR UNCONSTRAINED ONLINE HANDWRITING , 2004 .

[15]  Zhixin Shi,et al.  A natural learning algorithm based on Hough transform for text lines extraction in handwritten documents , 1999 .