DHJ: A database of handwritten Jawi for recognition research

This paper proposes a database of handwritten Jawi (DHJ) for conducting the research in this area. Previously, a printed Jawi dataset has been proposed and this handwritten dataset is aim to extend the dataset collection in Jawi database. The DHJ database contains handwritten Jawi character, handwritten Jawi word, and handwritten Jawi sentence. This dataset was recorded from ten writers with different age, gender, and educational background. The participants were asked to fill a form, which had three parts: character part, word part, and sentence part. The DHJ was stored in jpeg digital image format and each of image has three different sizes. Totally, we have 3,810 handwritten Jawi characters, 300 handwritten Jawi words, and 40 handwritten Jawi sentences.

[1]  Imran Siddiqi,et al.  A comprehensive survey of handwritten document benchmarks: structure, usage and evaluation , 2015, EURASIP J. Image Video Process..

[2]  Slim Kanoun,et al.  ALTID : Arabic/Latin Text Images Database for recognition research , 2015, 2015 13th International Conference on Document Analysis and Recognition (ICDAR).

[3]  Jonathan J. Hull,et al.  A Database for Handwritten Text Recognition Research , 1994, IEEE Trans. Pattern Anal. Mach. Intell..

[4]  Muhammad Imran Razzak,et al.  UCOM offline dataset-an urdu handwritten dataset generation , 2017, Int. Arab J. Inf. Technol..

[5]  Daehwan Kim,et al.  Handwritten Korean character image database PE92 , 1993, Proceedings of 2nd International Conference on Document Analysis and Recognition (ICDAR '93).

[6]  Lambert Schomaker,et al.  Towards Explainable Writer Verification and Identification Using Vantage Writers , 2007 .

[7]  Golnaz Ghiasi,et al.  HaFT: A handwritten Farsi text database , 2013, 2013 8th Iranian Conference on Machine Vision and Image Processing (MVIP).

[8]  Khairul Munadi,et al.  Improvement of binarization performance by applying DCT as pre-processing procedure , 2014, 2014 6th International Symposium on Communications, Control and Signal Processing (ISCCSP).

[9]  Mohamad Shanudin Zakaria,et al.  Handwritten Cursive Jawi Character Recognition: A Survey , 2008, 2008 Fifth International Conference on Computer Graphics, Imaging and Visualisation.

[10]  Hamzah Luqman,et al.  KAFD Arabic font database , 2014, Pattern Recognit..

[11]  Cheng-Lin Liu,et al.  CASIA-OLHWDB1: A Database of Online Handwritten Chinese Characters , 2009, 2009 10th International Conference on Document Analysis and Recognition.

[12]  Edouard Geoffrois,et al.  Results of the RIMES Evaluation Campaign for Handwritten Mail Processing , 2009, 2009 10th International Conference on Document Analysis and Recognition.

[13]  Imran Siddiqi,et al.  An Unconstrained Benchmark Urdu Handwritten Sentence Database with Automatic Line Segmentation , 2012, 2012 International Conference on Frontiers in Handwriting Recognition.

[14]  Rosli Salleh,et al.  Off-Line Handwritten Jawi Character Segmentation Using Histogram Normalization And Sliding Window Approach For Hardware Implementation , 2009 .

[15]  V. Märgner,et al.  IfN / Farsi-Database : A Database of Farsi Handwritten City Names , 2008 .

[16]  Ching Y. Suen,et al.  A New Large Urdu Database for Off-Line Handwriting Recognition , 2009, ICIAP.

[17]  Maria Petrou,et al.  Jawi Character Recognition Using the Trace Transform , 2010, 2010 Seventh International Conference on Computer Graphics, Imaging and Visualization.

[18]  Ching Y. Suen,et al.  Standard Databases for Recognition of Handwritten Digits, Numerical Strings, Legal Amounts, Letters and Dates in Farsi Language , 2006 .

[19]  Haikal El Abed,et al.  LAMIS-MSHD: A Multi-script Offline Handwriting Database , 2014, 2014 14th International Conference on Frontiers in Handwriting Recognition.

[20]  Khairul Munadi,et al.  A database of printed Jawi character image , 2015, 2015 Third International Conference on Image Information Processing (ICIIP).