Abstract — In this paper we present the first Arabic sentence dataset for on-line handwriting recognition written on tablet pc. The dataset is natural, simple and clear. Texts are sampled from daily newspapers. To collect naturally written handwriting, forms are dictated to writers. The current version of our dataset includes 154 paragraphs written by 48 writers. It contains more than 3800 words and more than 19,400 characters. Handwritten texts are mainly written by researchers from different research centers. In order to use this dataset in a recognition system word extraction is needed. In this paper a new word extraction technique based on the Arabic handwriting cursive nature is also presented. The technique is applied to this dataset and good results are obtained. The results can be considered as a bench mark for future research to be compared with. Keywords — Arabic, Handwriting recognition, on-line dataset. I. I NTRODUCTION ANDWRITING is one of the most important ways in which civilized people communicate. It is used both for personal (e.g. letters, notes, addresses on envelopes, etc.) and business communications (e.g. bank checks, tax and business forms, etc.) between person and person and for communications written to ourselves (e.g. reminders, lists, diaries, etc.) [1]. Despite long standing predictions that handwriting, and even paper itself, would become obsolete in the age of the digital computer, both persist. The reason that handwriting persists in the age of the digital computer is the convenience of paper and pen as compared to keyboards for numerous day-to-day situations. Computers are becoming ubiquitous as more people than ever are forced into contact with computers and our dependence upon them continues to increase, it is essential that they become more friendly to use. As more of the world’s information processing is done electronically, it becomes more important to make the transfer of information between people and machines simple and reliable. Thus the daily
[1]
E. Ratzlaff,et al.
INTER-LINE DISTANCE ESTIMATION AND TEXT LINE EXTRACTION FOR UNCONSTRAINED ONLINE HANDWRITING
,
2004
.
[2]
Gareth Loudon,et al.
A METHOD FOR HANDWRITING INPUT AND CORRECTION ON SMARTPHONES
,
2004
.
[3]
Robert Sabourin,et al.
Large vocabulary off-line handwritten word recognition
,
2002
.
[4]
Volker Märgner,et al.
On-line Arabic handwriting recognition competition
,
2011,
2011 International Conference on Document Analysis and Recognition.
[5]
Horst Bunke,et al.
The IAM-database: an English sentence database for offline handwriting recognition
,
2002,
International Journal on Document Analysis and Recognition.
[6]
S. N Srihari,et al.
Proceedings of the 8th International Workshop on Frontiers in Handwriting Recognition
,
2002
.
[7]
Roy Huber,et al.
Handwriting Identification: Facts and Fundamentals
,
1999
.
[8]
R. Cole,et al.
Survey of the State of the Art in Human Language Technology
,
2010
.
[9]
Tonghua Su,et al.
HIT-MW Dataset for Offline Chinese Handwritten Text Recognition
,
2006
.
[10]
Marcus Liwicki,et al.
IAM-OnDB - an on-line English sentence database acquired from handwritten text on a whiteboard
,
2005,
Eighth International Conference on Document Analysis and Recognition (ICDAR'05).
[11]
Volker Märgner,et al.
ICDAR 2009 Online Arabic Handwriting Recognition Competition
,
2009,
2009 10th International Conference on Document Analysis and Recognition.