Efficient Word Segmentation and Baseline Localization in Handwritten Documents Using Isothetic Covers

Analysis of handwritten documents is a challenging task in the modern era of document digitization. It requires efficient preprocessing which includes word segmentation and baseline detection. This paper proposes a novel approach toward word segmentation and baseline detection in a handwritten document. It is based on certain structural properties of isothetic covers tightly enclosing the words in a handwritten document. For an appropriate grid size, the isothetic covers successfully segregate the words so that each cover corresponds to a particular word. The grid size is selected by an adaptive technique that classifies the inter-cover distances into two classes in an unsupervised manner. Finally, by using a geometric heuristic with the horizontal chords of these covers, the corresponding baselines are extracted. Owing to its traversal strategy along the word boundaries in a combinatorial manner and usage of limited operations strictly in the integer domain, the method is found to be quite fast, efficient, and robust, as demonstrated by experimental results with datasets of both Bengali and English handwritings.

[1]  Partha Bhowmick,et al.  Word Segmentation and Baseline Detection in Handwritten Documents Using Isothetic Covers , 2010, 2010 12th International Conference on Frontiers in Handwriting Recognition.

[2]  Kamil Aida-zade,et al.  Word base line detection in handwritten text recognition systems , 2009 .

[3]  Koh Kakusho,et al.  Retrieval of Web Pages on Real-World Events related to Physical Objects , 2012, Int. J. Inf. Retr. Res..

[4]  Bidyut Baran Chaudhuri,et al.  2009 10th International Conference on Document Analysis and Recognition Handwritten Text Line Identification In Indian Scripts , 2022 .

[5]  B. B. Chaudhuri,et al.  Curvelet-Based Multi SVM Recognizer for Offline Handwritten Bangla: A Major Indian Script , 2007 .

[6]  van Galen Gp,et al.  Neuromotor control in handwriting and drawing: introduction and overview. , 1998 .

[7]  Réjean Plamondon,et al.  Computer processing of handwriting , 1990 .

[8]  Sargur N. Srihari,et al.  A system to read names and addresses on tax forms , 1996 .

[9]  W SeniorAndrew,et al.  An Off-Line Cursive Handwriting Recognition System , 1998 .

[10]  Partha Bhowmick,et al.  Construction of isothetic covers of a digital object: A combinatorial approach , 2010, J. Vis. Commun. Image Represent..

[11]  Jin Wang,et al.  Segmentation of merged characters by neural networks and shortest-path , 1993, SAC '93.

[12]  Bidyut Baran Chaudhuri,et al.  Handwritten Numeral Databases of Indian Scripts and Multistage Recognition of Mixed Numerals , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Lambert Schomaker,et al.  Computer Processing of Handwriting , 1990 .

[14]  Rafik Bouaziz,et al.  Fuzzy Ontologies Building Platform for Semantic Web: FOB Platform , 2012 .

[15]  Tariq Ashraf,et al.  Design, Development, and Management of Resources for Digital Library Services , 2012 .

[16]  Gyeonghwan Kim,et al.  A Lexicon Driven Approach to Handwritten Word Recognition for Real-Time Applications , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[17]  C. G. Leedham,et al.  Handwriting and Drawing Research: Basic and Applied Issues , 1996 .

[18]  Alireza Isfandyari-Moghaddam Next Generation Search Engines: Advanced Models for Information Retrieval , 2013 .

[19]  Schubert Foo,et al.  On the Effectiveness of Social Tagging for Resource Discovery , 2009, Handbook of Research on Digital Libraries.

[20]  Alan M. Wing,et al.  Development of graphic skills: Research perspectives and educational implications. , 1991 .

[21]  Emmanuel Augustin,et al.  A2iA Check Reader: a family of bank check recognition systems , 1999, Proceedings of the Fifth International Conference on Document Analysis and Recognition. ICDAR '99 (Cat. No.PR00318).

[22]  Prasun Sinha,et al.  A system for cursive handwritten address recognition , 1998, Proceedings. Fourteenth International Conference on Pattern Recognition (Cat. No.98EX170).

[23]  Bidyut Baran Chaudhuri,et al.  Curvelet-Based Multi SVM Recognizer for Offline Handwritten Bangla: A Major Indian Script , 2007, Ninth International Conference on Document Analysis and Recognition (ICDAR 2007).

[24]  Adel M. Alimi,et al.  New Algorithm of Straight or Curved Baseline Detection for Short Arabic Handwritten Writing , 2009, 2009 10th International Conference on Document Analysis and Recognition.

[25]  Ching Y. Suen,et al.  Recognition of legal amounts on bank cheques , 1998, Pattern Analysis and Applications.

[26]  N. Otsu A threshold selection method from gray level histograms , 1979 .

[27]  Réjean Plamondon,et al.  Handwriting processing and recognition , 1993, Pattern Recognit..

[28]  Bidyut Baran Chaudhuri,et al.  Automation of Indian Postal Documents Written in Bangla and English , 2009, Int. J. Pattern Recognit. Artif. Intell..

[29]  Mustafa Mat Deris,et al.  SAR: An Algorithm for Selecting a Partition Attribute in Categorical-Valued Information System Using Soft Set Theory , 2011, Int. J. Inf. Retr. Res..

[30]  Paul D. Gader,et al.  Handwritten Word Recognition Using Segmentation-Free Hidden Markov Modeling and Segmentation-Based Dynamic Programming Techniques , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[31]  Bidyut Baran Chaudhuri,et al.  Online handwritten Bangla character recognition using HMM , 2008, 2008 19th International Conference on Pattern Recognition.

[32]  Sargur N. Srihari,et al.  Integration of hand-written address interpretation technology into the United States Postal Service Remote Computer Reader system , 1997, Proceedings of the Fourth International Conference on Document Analysis and Recognition.

[33]  Ioannis Pratikakis,et al.  Text line and word segmentation of handwritten documents , 2009, Pattern Recognit..

[34]  Jason D. Baker,et al.  Online Instruments, Data Collection, and Electronic Measurements: Organizational Advancements , 2012 .

[35]  Komal Kumar Bhatia,et al.  International Journal of Information Retrieval Research , 2011 .

[36]  Jianchang Mao,et al.  Automated forms-processing software and services , 1996, IBM J. Res. Dev..

[37]  Seong-Whan Lee,et al.  A new methodology for gray-scale character segmentation and recognition , 1995, Proceedings of 3rd International Conference on Document Analysis and Recognition.

[38]  Sargur N. Srihari,et al.  Word segmentation of off-line handwritten documents , 2008, Electronic Imaging.

[39]  Haizhou Li,et al.  Chinese Word Segmentation , 1998, PACLIC.

[40]  D. Fields,et al.  A Cross-Cultural Measure of Servant Leadership Behaviors , 2013 .