KHATT: An open Arabic offline handwritten text database

Abstract A comprehensive Arabic handwritten text database is an essential resource for Arabic handwritten text recognition research. This is especially true due to the lack of such database for Arabic handwritten text. In this paper, we report our comprehensive Arabic offline Handwritten Text database (KHATT) consisting of 1000 handwritten forms written by 1000 distinct writers from different countries. The forms were scanned at 200, 300, and 600 dpi resolutions. The database contains 2000 randomly selected paragraphs from 46 sources, 2000 minimal text paragraph covering all the shapes of Arabic characters, and optionally written paragraphs on open subjects. The 2000 random text paragraphs consist of 9327 lines. The database forms were randomly divided into 70%, 15%, and 15% sets for training, testing, and verification, respectively. This enables researchers to use the database and compare their results. A formal verification procedure is implemented to align the handwritten text with its ground truth at the form, paragraph and line levels. The verified ground truth database contains meta-data describing the written text at the page, paragraph, and line levels in text and XML formats. Tools to extract paragraphs from pages and segment paragraphs into lines are developed. In addition we are presenting our experimental results on the database using two classifiers, viz. Hidden Markov Models (HMM) and our novel syntactic classifier. The database is made freely available to researchers world-wide for research in various handwritten-related problems such as text recognition, writer identification and verification, forms analysis, pre-processing, segmentation. Several international research groups/researchers acquired the database for use in their research so far.

[1]  C. L. Philip Chen,et al.  Optimization of Sensor Locations and Sensitivity Analysis for Engine Health Monitoring Using Minimum Interference Algorithms , 2007, 2007 IEEE International Conference on System of Systems Engineering.

[2]  Ching Y. Suen,et al.  Application of Support Vector Machines for Recognition of Handwritten Arabic/Persian Digits , 2003 .

[3]  R. Ward,et al.  A new comprehensive database of handwritten Arabic words, numbers, and signatures used for OCR testing , 1999, Engineering Solutions for the Next Millennium. 1999 IEEE Canadian Conference on Electrical and Computer Engineering (Cat. No.99TH8411).

[4]  Sherif Abdelazeem,et al.  A Two-Stage System for Arabic Handwritten Digit Recognition Tested on a New Large Database , 2007, Artificial Intelligence and Pattern Recognition.

[5]  Wei Zhao,et al.  Printed Arabic Character Recognition Using HMM , 2004, J. Comput. Sci. Technol..

[6]  Sabri A. Mahmoud,et al.  Polygonal approximation of digital planar curves through adaptive optimizations , 2010, Pattern Recognit. Lett..

[7]  Sabri A. Mahmoud,et al.  Survey and bibliography of Arabic optical text recognition , 1995, Signal Process..

[8]  Hussein Almuallim,et al.  A Method of Recognition of Arabic Cursive Handwriting , 1987, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Mokhtar Sellami,et al.  Artificial neural network fusion: Application to Arabic words recognition , 2005, ESANN.

[10]  Somaya Al-Máadeed,et al.  A data base for Arabic handwritten text recognition research , 2002, Proceedings Eighth International Workshop on Frontiers in Handwriting Recognition.

[11]  Chafic Mokbel,et al.  Combining Slanted-Frame Classifiers for Improved HMM-Based Arabic Handwriting Recognition , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Horst Bunke,et al.  The IAM-database: an English sentence database for offline handwriting recognition , 2002, International Journal on Document Analysis and Recognition.

[13]  Sung-Hyuk Cha,et al.  Assessing the authorship confidence of handwritten items , 2000, Proceedings Fifth IEEE Workshop on Applications of Computer Vision.

[14]  Hermann Ney,et al.  White-space models for offline Arabic handwriting recognition , 2008, 2008 19th International Conference on Pattern Recognition.

[15]  Adnan Amin,et al.  Hand-printed arabic character recognition system using an artificial network , 1996, Pattern Recognit..

[16]  Mokhtar Sellami,et al.  Semi-continuous HMMs with explicit state duration for unconstrained Arabic word modeling and recognition , 2008, Pattern Recognit. Lett..

[17]  Volker Märgner,et al.  The IFN/ENIT-database - a tool to develop Arabic handwriting recognition systems , 2007, 2007 9th International Symposium on Signal Processing and Its Applications.

[18]  M. Pechwitz,et al.  IFN/ENIT: database of handwritten arabic words , 2002 .

[19]  Sameh M. Awaidah,et al.  A multiple feature/resolution scheme to Arabic (Indian) numerals recognition using hidden Markov models , 2009, Signal Process..

[20]  Sabri A. Mahmoud,et al.  Recognition of writer-independent off-line handwritten Arabic (Indian) numerals using hidden Markov models , 2008, Signal Process..

[21]  Ching Y. Suen,et al.  Databases for recognition of handwritten Arabic cheques , 2003, Pattern Recognit..

[22]  Rohit Prasad,et al.  Stochastic Segment Modeling for Offline Handwriting Recognition , 2009, 2009 10th International Conference on Document Analysis and Recognition.

[23]  Gernot A. Fink,et al.  Toward automatic video-based whiteboard reading , 2004, International Journal of Document Analysis and Recognition (IJDAR).

[24]  Ernest Valveny,et al.  Generalized median graph computation by means of graph embedding in vector spaces , 2010, Pattern Recognit..

[25]  Dave Elliman,et al.  Off-line recognition of handwritten Arabic words using multiple hidden Markov models , 2004, Knowl. Based Syst..

[26]  Gernot A. Fink,et al.  Markov models for offline handwriting recognition: a survey , 2009, International Journal on Document Analysis and Recognition (IJDAR).

[27]  Mohammad Alshayeb,et al.  A Database for Offline Arabic Handwritten Text Recognition , 2011, ICIAR.

[28]  Karim Faez,et al.  Handwritten Farsi (Arabic) word recognition: a holistic approach using discrete HMM , 2001, Pattern Recognit..

[29]  K. R. Rao,et al.  Orthogonal Transforms for Digital Signal Processing , 1979, IEEE Transactions on Systems, Man and Cybernetics.

[30]  Sabri A. Mahmoud,et al.  Arabic Handwritten Alphanumeric Character Recognition using Fuzzy Attributed Turning Functions , 2011 .

[31]  Sabri A. Mahmoud,et al.  Arabic handwriting recognition using structural and syntactic pattern attributes , 2013, Pattern Recognit..

[32]  Hermann Ney,et al.  Writer Adaptive Training and Writing Variant Model Refinement for Offline Arabic Handwriting Recognition , 2009, 2009 10th International Conference on Document Analysis and Recognition.

[33]  Husni Al-Muhtaseb,et al.  Recognition of off-line printed Arabic text using Hidden Markov Models , 2008, Signal Process..

[34]  Rohit Prasad,et al.  Improvements in BBN's HMM-Based Offline Arabic Handwriting Recognition System , 2009, 2009 10th International Conference on Document Analysis and Recognition.

[35]  Marc Parizeau,et al.  A Fuzzy-Syntactic Approach to Allograph Modeling for Cursive Script Recognition , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[36]  Wasfi G. Al-Khatib,et al.  Recognition of Arabic (Indian) bank check digits using log-gabor filters , 2011, Applied Intelligence.

[37]  Richard M. Schwartz,et al.  An Omnifont Open-Vocabulary OCR System for English and Arabic , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[38]  Mokhtar Sellami,et al.  Arabic Handwritten Word Recognition Using HMMs with Explicit State Duration , 2007, EURASIP J. Adv. Signal Process..

[39]  N. Otsu A threshold selection method from gray level histograms , 1979 .

[40]  Mohammad Alshayeb,et al.  KHATT: Arabic Offline Handwritten Text Database , 2012, 2012 International Conference on Frontiers in Handwriting Recognition.

[41]  Venu Govindaraju,et al.  Offline Arabic handwriting recognition: a survey , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[42]  R. J. Green,et al.  Recognition of Handwritten Cursive Arabic Characters , 1994, IEEE Trans. Pattern Anal. Mach. Intell..

[43]  Adel M. Alimi,et al.  2009 10th International Conference on Document Analysis and Recognition Combining Multiple HMMs Using On-line and Off-line Features for Off-line Arabic Handwriting Recognition , 2022 .

[44]  Louis Vuurpijl,et al.  Forensic writer identification: a benchmark data set and a comparison of two systems , 2000 .

[45]  Horst Bunke,et al.  Handwritten sentence recognition , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[46]  Samy Bengio,et al.  Offline recognition of unconstrained handwritten texts using HMMs and statistical language models , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[47]  Abdelmajid Ben Hamadou,et al.  Off-line handwritten word recognition using multi-stream hidden Markov models , 2010, Pattern Recognit. Lett..

[48]  Sabri A. Mahmoud,et al.  Recognition : A Survey , 2013 .

[49]  Laurence Likforman-Sulem,et al.  Combination of HMM-Based Classifiers for the Recognition of Arabic Handwritten Words , 2007 .

[50]  Chafic Mokbel,et al.  Arabic handwriting recognition using baseline dependant features and hidden Markov modeling , 2005, Eighth International Conference on Document Analysis and Recognition (ICDAR'05).

[51]  Volker Märgner,et al.  HMM based approach for handwritten arabic word recognition using the IFN/ENIT - database , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[52]  P. Adibi,et al.  NASTAALIGH HANDWRITTEN WORD RECOGNITION USING A CONTINUOUS-DENSITY VARIABLE-DURATION HMM , 2005 .

[53]  Saudi Arabia,et al.  RECOGNITION OF OFF-LINE HANDWRITTEN ARABIC (INDIAN) NUMERALS USING MULTI-SCALE FEATURES AND SUPPORT VECTOR MACHINES VS. HIDDEN MARKOV MODELS , 2009 .

[54]  Liliane Peters,et al.  Fuzzy handwriting description language: : FOHDEL , 2000, Pattern Recognit..

[55]  Esther M. Arkin,et al.  An efficiently computable metric for comparing polygonal shapes , 1991, SODA '90.

[56]  A. Dehghani,et al.  Off-line recognition of isolated Persian handwritten characters using multiple hidden Markov models , 2001, Proceedings International Conference on Information Technology: Coding and Computing.