A full English sentence database for off-line handwriting recognition

We present a new database for off-line handwriting recognition, together with a few preprocessing and text segmentation procedures. The database is based on the Lancaster-Oslo/Bergen(LOB) corpus. This corpus is a collection of tests that were used to generate forms, which subsequently were filled out by persons in their own handwriting. As of December 1998 the database includes 556 forms produced by approximately 250 different writers. The database consists of full English sentences. It could serve as a basis for a variety of handwriting recognition tasks. The main focus, however is on recognition techniques that use linguistic knowledge beyond the lexicon level. This knowledge can be automatically derived from the corpus or it can be supplied from external sources.

[1]  Sargur N. Srihari,et al.  Experiments in Text Recognition with Binary n-Gram and Viterbi Algorithms , 1982, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Johansson. Stig,et al.  Manual of information to accompany the Lancaster-Oslo : Bergen Corpus of British English, for use with digital computers , 1978 .

[3]  Jonathan J. Hull,et al.  A Database for Handwritten Text Recognition Research , 1994, IEEE Trans. Pattern Anal. Mach. Intell..

[4]  Isabelle Guyon,et al.  UNIPEN project of on-line data exchange and recognizer benchmarks , 1994, Proceedings of the 12th IAPR International Conference on Pattern Recognition, Vol. 3 - Conference C: Signal Processing (Cat. No.94CH3440-5).

[5]  Dorothea Blostein,et al.  Handbook of Character Recognition and Document Image Analysis , 1997 .

[6]  Jonathan J. Hull Incorporating Language Syntax in Visual Text Recognition with a Statistical Model , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[7]  Patrick J. Grother,et al.  The First Census Optical Character Recognition Systems Conference | NIST , 1992 .

[8]  Jin Hyung Kim,et al.  Context dependent search in interconnected hidden Markov model for unconstrained handwriting recognition , 1995, Pattern Recognit..

[9]  Daehwan Kim,et al.  Handwritten Korean character image database PE92 , 1993, Proceedings of 2nd International Conference on Document Analysis and Recognition (ICDAR '93).

[10]  Gyeonghwan Kim,et al.  An architecture for handwritten text recognition systems , 1999, International Journal on Document Analysis and Recognition.

[11]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[12]  Horst Bunke,et al.  Towards General Cursive Script Recognition , 1999 .

[13]  Frederick Jelinek,et al.  Refinement of a Structured Language Model , 2000, ArXiv.

[14]  J. Cleary,et al.  \self-organized Language Modeling for Speech Recognition". In , 1997 .