Detecting and recognizing numerical strings in Farsi document images

In this paper, we propose a new approach for detecting and recognizing numerical strings in Farsi/Arabic handwritten or machine-printed document images. We assign a label to each of the connected components as they belong to a numerical string or not. First, in order to differentiate between digit and non-digit connected components, some simple features are extracted from all connected components in each text line. Then, these features are classified with a fuzzy rule-based classifier to extract some candidate strings. After using a digit recognizer, syntax of the numerical strings are validated by a syntactic verifier. Experimental results show an acceptable detection rate with low false positive rate.

[1]  H. Ishibuchi,et al.  Distributed representation of fuzzy rules and its application to pattern classification , 1992 .

[2]  G. Louloudisa,et al.  Text line detection in handwritten documents , 2008 .

[3]  Cheng-Lin Liu,et al.  Handwritten digit recognition: benchmarking of state-of-the-art techniques , 2003, Pattern Recognit..

[4]  Ching Y. Suen,et al.  A new benchmark on the recognition of handwritten Bangla and Farsi numeral characters , 2009, Pattern Recognit..

[5]  José A. Rodríguez-Serrano,et al.  Handwritten word-spotting using hidden Markov models and universal vocabularies , 2009, Pattern Recognit..

[6]  David S. Doermann,et al.  The Indexing and Retrieval of Document Images: A Survey , 1998, Comput. Vis. Image Underst..

[7]  Hisao Ishibuchi,et al.  Comparison of Heuristic Criteria for Fuzzy Rule Selection in Classification Problems , 2004, Fuzzy Optim. Decis. Mak..

[8]  Amar Gupta,et al.  A system for processing handwritten bank checks automatically , 2008, Image Vis. Comput..

[9]  Hisao Ishibuchi,et al.  Effect of rule weights in fuzzy rule-based classification systems , 2001, IEEE Trans. Fuzzy Syst..

[10]  Bernd Jähne,et al.  BOOK REVIEW: Digital Image Processing, 5th revised and extended edition , 2002 .

[11]  Ching Y. Suen,et al.  Standard Databases for Recognition of Handwritten Digits, Numerical Strings, Legal Amounts, Letters and Dates in Farsi Language , 2006 .

[12]  Clément Chatelain,et al.  Segmentation-Driven Recognition Applied to Numerical Field Extraction from Handwritten Incoming Mail Documents , 2006, Document Analysis Systems.

[13]  Ching Y. Suen,et al.  Automatic recognition of handwritten data on cheques - Fact or fiction? , 1999, Pattern Recognit. Lett..

[14]  Ching Y. Suen,et al.  Differentiation between alphabetic and numeric data using NN ensembles , 2002, Object recognition supported by user interaction for service robots.

[15]  Thierry Paquet,et al.  Automatic extraction of numerical sequences in handwritten incoming mail documents , 2005, Pattern Recognit. Lett..

[16]  Clément Chatelain,et al.  A syntax-directed method for numerical field extraction using classifier combination , 2004, Ninth International Workshop on Frontiers in Handwriting Recognition.

[17]  Ezzat El-Sherif,et al.  Arabic handwritten digit recognition , 2008, International Journal of Document Analysis and Recognition (IJDAR).

[18]  Michael Sipser,et al.  Introduction to the Theory of Computation , 1996, SIGA.

[19]  Hisao Ishibuchi,et al.  Rule weight specification in fuzzy rule-based classification systems , 2005, IEEE Transactions on Fuzzy Systems.