Evolution maps and applications

Common tasks in document analysis, such as binarization, line extraction etc., are still considered diffi cult for highly degraded text documents. Having reliable fundamental information regarding the characters of the document, such as the distribution of character dimensions and stroke width, can significantly improve the performance of these tasks. We introduce a novel perspective of the image data which maps the evolution of connected components along the change in gray scale threshold. The maps reveal significant information about the sets of elements in the document, such as characters, noise, stains, and words. The information is further employed to improve state of the art binarization algorithm, and achieve automatically character size estimation, line extraction, stroke width estimation, and feature distribution analysis, all of which are hard tasks for highly degraded documents.

[1]  Laurent Wendling,et al.  A document binarization method based on connected operators , 2010, Pattern Recognit. Lett..

[2]  Alicia Fornés,et al.  Transcription alignment of Latin manuscripts using hidden Markov models , 2011, HIP '11.

[3]  Jihad El-Sana,et al.  Text line segmentation for gray scale historical document images , 2011, HIP '11.

[4]  M. Brysbaert,et al.  Reexamining the word length effect in visual word recognition: New evidence from the English Lexicon Project , 2006, Psychonomic bulletin & review.

[5]  Jihad El-Sana,et al.  Text Line Detection in Corrupted and Damaged Historical Manuscripts , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[6]  Nikos Papamarkos,et al.  TEXT EXTRACTION USING DOCUMENT STRUCTURE FEATURES AND SUPPORT VECTOR MACHINES , 2010 .

[7]  Syed Saqib Bukhari,et al.  Document image segmentation using discriminative learning over connected components , 2010, DAS '10.

[8]  Its'hak Dinstein,et al.  Binarization, character extraction, and writer identification of historical Hebrew calligraphy documents , 2007, International Journal of Document Analysis and Recognition (IJDAR).

[9]  Anil K. Jain,et al.  Page segmentation using tecture analysis , 1996, Pattern Recognit..

[10]  Amir Averbuch,et al.  Digital image thresholding, based on topological stable-state , 1996, Pattern Recognit..

[11]  Umapada Pal,et al.  Multi-Oriented and Multi-Sized Touching Character Segmentation Using Dynamic Programming , 2009, 2009 10th International Conference on Document Analysis and Recognition.

[12]  Xiaoqing Ding,et al.  A general framework for multicharacter segmentation and its application in recognizing multilingual Asian documents , 2003, IS&T/SPIE Electronic Imaging.

[13]  Nikos Papamarkos,et al.  Optimal combination of document binarization techniques using a self-organizing map neural network , 2007, Eng. Appl. Artif. Intell..

[14]  Sargur N. Srihari,et al.  Document Image Binarization Based on Texture Features , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[15]  V. Mosorov,et al.  The development of component tree structure for grayscale image segmentation , 2002, Modern Problems of Radio Engineering, Telecommunications and Computer Science (IEEE Cat. No.02EX542).

[16]  Ioannis Pratikakis,et al.  H-DIBCO 2010 - Handwritten Document Image Binarization Competition , 2010, 2010 12th International Conference on Frontiers in Handwriting Recognition.

[17]  Andreas Keller,et al.  Lexicon-free handwritten word spotting using character HMMs , 2012, Pattern Recognit. Lett..

[18]  Angelika Garz,et al.  Binarization-Free Text Line Segmentation for Historical Documents Based on Interest Point Clustering , 2012, 2012 10th IAPR International Workshop on Document Analysis Systems.

[19]  Its'hak Dinstein,et al.  WebGT: An Interactive Web-Based System for Historical Document Ground Truth Generation , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[20]  A. G. Ramakrishnan,et al.  Gabor filter based block energy analysis for text extraction from digital document images , 2004, First International Workshop on Document Image Analysis for Libraries, 2004. Proceedings..

[21]  Mohamed Cheriet,et al.  A local linear level set method for the binarization of degraded historical document images , 2012, International Journal on Document Analysis and Recognition (IJDAR).

[22]  Ioannis Pratikakis,et al.  A Modified Adaptive Logical Level Binarization Technique for Historical Document Images , 2009, 2009 10th International Conference on Document Analysis and Recognition.

[23]  Roberto Marcondes Cesar Junior,et al.  Image Segmentation Using Component Tree and Normalized Cut , 2010, 2010 23rd SIBGRAPI Conference on Graphics, Patterns and Images.

[24]  Jiri Matas,et al.  Robust wide-baseline stereo from maximally stable extremal regions , 2004, Image Vis. Comput..

[25]  Rupinder Kaur,et al.  Review of Robust Document Image BINARIZATION Technique for Degraded Document Images , 2015 .

[26]  Jiri Matas,et al.  Robust wide-baseline stereo from maximally stable extremal regions , 2004, Image Vis. Comput..

[27]  Basilios Gatos,et al.  Handwriting Segmentation Contest , 2007, Ninth International Conference on Document Analysis and Recognition (ICDAR 2007).