Handwriting Recognition of Whiteboard Notes - Studying the Influence of Training Set Size and Type

This paper presents a system for the recognition of online whiteboard notes. Notes written on a whiteboard is a new modality in handwriting recognition research that has received relatively little attention in the past. For the recognition we use an offline HMM-recognizer, which is supplemented with methods for processing the online data and generating offline images. The system consists of six main modules: online preprocessing, transformation of online to offline data, offline preprocessing, feature extraction, classification and post-processing. The recognition rate of our basic recognizer in a writer independent experiment is 59.5%. By applying state-of-the-art methods, such as optimizing the number of states and Gaussian components, and by including a language model we could achieve a statistically significant increase of the recognition rate to 64.3%. To further improve the system performance we increased the size of the training set. For that we investigated two different strategies. First, we used another existing database of offline handwritten text. Second, we used a recently collected whiteboard database, called the IAM-OnDB. By means of these strategies the recognition rate could be further increased up to 68.5%.

[1]  Marcus Liwicki,et al.  Enhancing training data for handwriting recognition of whiteboard notes with samples from a different database , 2005, Eighth International Conference on Document Analysis and Recognition (ICDAR'05).

[2]  Masaki Nakagawa,et al.  An improved approach to generating realistic Kanji character images from on-line characters and its benefit to off-line recognition performance , 2002, Object recognition supported by user interaction for service robots.

[3]  Marcus Liwicki,et al.  Handwriting Recognition of Whiteboard Notes , 2005 .

[4]  Donald B. Rubin,et al.  Max-imum Likelihood from Incomplete Data , 1972 .

[5]  Sargur N. Srihari,et al.  On-Line and Off-Line Handwriting Recognition: A Comprehensive Survey , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[6]  Philip C. Woodland,et al.  Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models , 1995, Comput. Speech Lang..

[7]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[8]  Gernot A. Fink,et al.  Video-based on-line handwriting recognition , 2001, Proceedings of Sixth International Conference on Document Analysis and Recognition.

[9]  Tanja Schultz,et al.  SMaRT: the Smart Meeting Room Task at ISL , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[10]  Horst Bunke,et al.  The IAM-database: an English sentence database for offline handwriting recognition , 2002, International Journal on Document Analysis and Recognition.

[11]  R. Bakis Continuous speech recognition via centisecond acoustic states , 1976 .

[12]  Gerhard Rigoll,et al.  Segmentation and classification of meeting events using multiple classifier fusion and dynamic programming , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[13]  Darren Moore,et al.  The IDIAP Smart Meeting Room , 2002 .

[14]  Raúl Rojas,et al.  Recognition of on-line handwritten mathematical formulas in the E-chalk system , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[15]  Geoffrey Leech,et al.  The tagged LOB Corpus : user's manual , 1986 .

[16]  Masaki Nakagawa,et al.  Accumulated-Recognition-Rate Normalization for Combining Multiple On/Off-Line Japanese Character Classifiers Tested on a Large Database , 2003, Multiple Classifier Systems.

[17]  Pietro Perona,et al.  Visual input for pen-based computers , 1996, Proceedings of 3rd IEEE International Conference on Image Processing.

[18]  Horst Bunke,et al.  Hidden Markov model length optimization for handwriting recognition systems , 2002, Proceedings Eighth International Workshop on Frontiers in Handwriting Recognition.

[19]  Samy Bengio,et al.  Writer adaptation techniques in HMM based Off-Line Cursive Script Recognition , 2002, Pattern Recognit. Lett..

[20]  Michael Perrone,et al.  Combining online and offline handwriting recognition , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[21]  Horst Bunke,et al.  Optimizing the integration of a statistical language model in HMM based offline handwritten text recognition , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[22]  Gerhard Rigoll,et al.  Handwritten Address Recognition Using Hidden Markov Models , 2004, Reading and Learning.

[23]  Marcus Liwicki,et al.  IAM-OnDB - an on-line English sentence database acquired from handwritten text on a whiteboard , 2005, Eighth International Conference on Document Analysis and Recognition (ICDAR'05).

[24]  Horst Bunke,et al.  HMM-based handwritten word recognition: on the optimization of the number of states, training iterations and Gaussian components , 2004, Pattern Recognit..

[25]  R. Rosenfeld,et al.  Two decades of statistical language modeling: where do we go from here? , 2000, Proceedings of the IEEE.

[26]  Chin-Hui Lee,et al.  MAP Estimation of Continuous Density HMM : Theory and Applications , 1992, HLT.

[27]  Horst Bunke,et al.  Using a Statistical Language Model to Improve the Performance of an HMM-Based Cursive Handwriting Recognition System , 2001, Int. J. Pattern Recognit. Artif. Intell..

[28]  J. Cordy,et al.  A Survey of Table Recognition : Models , Observations , Transformations , and Inferences , 2003 .

[29]  Gernot A. Fink,et al.  Toward automatic video-based whiteboard reading , 2004, International Journal of Document Analysis and Recognition (IJDAR).