Lexicon-based offline recognition of Amharic words in unconstrained handwritten text

This paper describes an offline handwriting recognition system for Amharic words based on lexicon. The system computes direction fields of scanned handwritten documents, from which pseudo-characters are segmented. The pseudo-characters are organized based on their proximity and direction to form text lines. Words are then segmented by analyzing the relative gap between subsequent pseudo-characters in text lines. For each segmented word image, the structural characteristics of pseudo-characters are syntactically analyzed to predict a set of plausible characters forming the word. The most likelihood word is finally selected among candidates by matching against the lexicon. The system is tested by a database of unconstrained handwritten Amharic documents collected from various sources. The lexicon is prepared from words appearing in the collected database.

[1]  Josef Bigun,et al.  Vision with Direction: A Systematic Introduction to Image Processing and Computer Vision , 2010 .

[2]  Sargur N. Srihari,et al.  On-Line and Off-Line Handwriting Recognition: A Comprehensive Survey , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[3]  Yaregal Assabie,et al.  Writer-independent Offline Recognition of Handwritten Ethiopic Characters , 2008, ICFHR 2008.

[4]  Horst Bunke,et al.  Recognition of cursive Roman handwriting: past, present and future , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[5]  Robert Sabourin,et al.  Large vocabulary off-line handwriting recognition: A survey , 2003, Pattern Analysis & Applications.