IAM-OnDB - an on-line English sentence database acquired from handwritten text on a whiteboard

In this paper we present IAM-OnDB - a new large online handwritten sentences database. It is publicly available and consists of text acquired via an electronic interface from a whiteboard. The database contains about 86 K word instances from an 11 K dictionary written by more than 200 writers. We also describe a recognizer for unconstrained English text that was trained and tested using this database. This recognizer is based on hidden Markov models (HMMs). In our experiments we show that by using larger training sets we can significantly increase the word recognition rate. This recognizer may serve as a benchmark reference for future research.

[1]  Geoffrey Leech,et al.  The tagged LOB Corpus : user's manual , 1986 .

[2]  Jonathan J. Hull,et al.  A Database for Handwritten Text Recognition Research , 1994, IEEE Trans. Pattern Anal. Mach. Intell..

[3]  Jr. G. Forney,et al.  The viterbi algorithm , 1973 .

[4]  Isabelle Guyon,et al.  UNIPEN project of on-line data exchange and recognizer benchmarks , 1994, Proceedings of the 12th IAPR International Conference on Pattern Recognition, Vol. 3 - Conference C: Signal Processing (Cat. No.94CH3440-5).

[5]  Hiroshi Tanaka,et al.  Handwriting-based learning materials on a tablet PC: a prototype and its practical studies in an elementary school , 2004, Ninth International Workshop on Frontiers in Handwriting Recognition.

[6]  Tanja Schultz,et al.  SMaRT: the Smart Meeting Room Task at ISL , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[7]  Horst Bunke,et al.  The IAM-database: an English sentence database for offline handwriting recognition , 2002, International Journal on Document Analysis and Recognition.

[8]  Darren Moore,et al.  The IDIAP Smart Meeting Room , 2002 .

[9]  Horst Bunke,et al.  HMM-based handwritten word recognition: on the optimization of the number of states, training iterations and Gaussian components , 2004, Pattern Recognit..

[10]  Horst Bunke,et al.  Automatic bankcheck processing , 1997 .

[11]  Alessandro Vinciarelli,et al.  A survey on off-line Cursive Word Recognition , 2002, Pattern Recognit..

[12]  Masaki Nakagawa,et al.  Accumulated-Recognition-Rate Normalization for Combining Multiple On/Off-Line Japanese Character Classifiers Tested on a Large Database , 2003, Multiple Classifier Systems.

[13]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[14]  Naohiro Furukawa,et al.  D-Pen: a digital pen system for public and business enterprises , 2004, Ninth International Workshop on Frontiers in Handwriting Recognition.

[15]  Horst Bunke,et al.  Using a Statistical Language Model to Improve the Performance of an HMM-Based Cursive Handwriting Recognition System , 2001, Int. J. Pattern Recognit. Artif. Intell..

[16]  Sargur N. Srihari Handwritten Address Interpretation: A Task of Many Pattern Recognition Problems , 2000, Int. J. Pattern Recognit. Artif. Intell..

[17]  Alessandro Vinciarelli,et al.  A Survey On Off-Line Cursive Script Recognition , 2000 .

[18]  R. Rosenfeld,et al.  Two decades of statistical language modeling: where do we go from here? , 2000, Proceedings of the IEEE.

[19]  Young-Joon Kim,et al.  Off-line recognition of totally unconstrained handwritten numerals using multilayer cluster neural network , 1994, Proceedings of the 12th IAPR International Conference on Pattern Recognition, Vol. 3 - Conference C: Signal Processing (Cat. No.94CH3440-5).

[20]  Marcus Liwicki,et al.  Enhancing training data for handwriting recognition of whiteboard notes with samples from a different database , 2005, Eighth International Conference on Document Analysis and Recognition (ICDAR'05).

[21]  Horst Bunke,et al.  Optimizing the integration of a statistical language model in HMM based offline handwritten text recognition , 2004, ICPR 2004.

[22]  Patrick J. Grother,et al.  The First Census Optical Character Recognition Systems Conference | NIST , 1992 .

[23]  Michael Perrone,et al.  Combining online and offline handwriting recognition , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[24]  Sargur N. Srihari,et al.  On-Line and Off-Line Handwriting Recognition: A Comprehensive Survey , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[25]  Stefan Knerr,et al.  The IRESTE On/Off (IRONOFF) dual handwriting database , 1999, Proceedings of the Fifth International Conference on Document Analysis and Recognition. ICDAR '99 (Cat. No.PR00318).

[26]  Horst Bunke,et al.  Recognition of cursive Roman handwriting: past, present and future , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[27]  Marcus Liwicki,et al.  Handwriting Recognition of Whiteboard Notes , 2005 .