PCA in On-Line Handwriting Recognition of Whiteboard Notes: A Novel VQ Design for Use with Discrete HMMs

In this work we further evaluate a recently published, novel vector quantization (VQ) design for discrete HMM-based on-line handwriting recognition of whiteboard notes. To decorrelate the features, a principal component analysis (PCA) is applied. The novel VQ design ensures a lossless representation of the pressure information while modeling the statistical dependencies between the pressure and the remaining features. This is necessary because standard k-Means VQ systems cannot quantize this binary feature adequately although they have been decorrelated, which is shown in this paper. Our experiments show that the new system provides a relative improvement of r = 2.8 % in character level accuracy and a relative improvement of r = 3.3 % in word level accuracy compared to a standard k-means VQ system. Additionally our system is compared and proven to be competitive to a stateof-the-art continuous HMM system yielding a relative improvement of r = 1.4 %. A relative improvement of up to r = 0.8 % in word level accuracy can be reported when using decorrelated features compared to a system omitting the decorrelation.

[1]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[2]  Marcus Liwicki,et al.  IAM-OnDB - an on-line English sentence database acquired from handwritten text on a whiteboard , 2005, Eighth International Conference on Document Analysis and Recognition (ICDAR'05).

[3]  M. Turk,et al.  Eigenfaces for Recognition , 1991, Journal of Cognitive Neuroscience.

[4]  L. Baum,et al.  Statistical Inference for Probabilistic Functions of Finite State Markov Chains , 1966 .

[5]  Christoph Neukirchen,et al.  A comparison between continuous and discrete density hidden Markov models for cursive handwriting recognition , 1996, Proceedings of 13th International Conference on Pattern Recognition.

[6]  Jerome R. Bellegarda,et al.  On-line handwriting recognition using continuous parameter hidden Markov models , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[7]  Marcus Liwicki,et al.  Feature selection for on-line handwriting recognition of whiteboard notes , 2007 .

[8]  Marcus Liwicki,et al.  Combining On-Line and Off-Line Systems for Handwriting Recognition , 2007, Ninth International Conference on Document Analysis and Recognition (ICDAR 2007).

[9]  Van Nostrand,et al.  Error Bounds for Convolutional Codes and an Asymptotically Optimum Decoding Algorithm , 1967 .

[10]  Gerhard Rigoll,et al.  Novel Hybrid NN/HMM Modelling Techniques for On-line Handwriting Recognition , 2006 .

[11]  Sargur N. Srihari,et al.  Off-Line Cursive Script Word Recognition , 1989, IEEE Trans. Pattern Anal. Mach. Intell..

[12]  K. Takahashi,et al.  A Discrete HMM for Online Handwriting Recognition , 2000, Int. J. Pattern Recognit. Artif. Intell..

[13]  Alexander H. Waibel,et al.  Online handwriting recognition: the NPen++ recognizer , 2001, International Journal on Document Analysis and Recognition.

[14]  Marcus Liwicki,et al.  HMM-Based On-Line Recognition of Handwritten Whiteboard Notes , 2006 .

[15]  Gerhard Rigoll,et al.  Novel VQ Designs for Discrete HMM On-Line Handwritten Whiteboard Note Recognition , 2008, DAGM-Symposium.

[16]  R. Gray,et al.  Vector quantization , 1984, IEEE ASSP Magazine.

[17]  E. Forgy,et al.  Cluster analysis of multivariate data : efficiency versus interpretability of classifications , 1965 .

[18]  Nikos Fakotakis,et al.  New algorithms for skewing correction and slant removal on word-level [OCR] , 1999, ICECS'99. Proceedings of ICECS '99. 6th IEEE International Conference on Electronics, Circuits and Systems (Cat. No.99EX357).

[19]  J. Makhoul,et al.  Vector quantization in speech coding , 1985, Proceedings of the IEEE.