Combination of N-Grams and Stochastic Context-Free Grammars in an Offline Handwritten Recognition System

One area of pattern recognition that is receiving a lot of attention recently is handwritten text recognition. Traditionally, handwritten text recognition systems have been modelled by means of HMM models and n-gram language models. The problem that n-grams present is that they are not able to capture long-term constraints of the sentences. Stochastic context-free grammars (SCFG) can be used to overcome this limitation by rescoring a n-best list generated with the HMM-based recognizer. Howerver, SCFG are known to have problems in the estimation of comlpex real tasks. In this work we propose the use of a combination of n-grams and category-based SCFG together with a word distribution into categories. The category-based approach is thought to simplify the SCFG inference process, while at the same time preserving the description power of the model. The results on the IAM-Database show that this combined scheme outperforms the classical scheme.

[1]  Jean-Cédric Chappelier,et al.  Offline grammar-based recognition of handwritten sentences , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Mari Ostendorf,et al.  Variable n-grams and extensions for conversational speech language modeling , 2000, IEEE Trans. Speech Audio Process..

[3]  Frederick Jelinek,et al.  Exploiting Syntactic Structure for Language Modeling , 1998, ACL.

[4]  Frederick Jelinek,et al.  Statistical methods for speech recognition , 1997 .

[5]  Ronald Rosenfeld,et al.  The CMU Statistical Language Modeling Toolkit and its use in the 1994 ARPA CSR Evaluation , 1995 .

[6]  Mounim A. El-Yacoubi,et al.  Conjoined location and recognition of street names within a postal address delivery line , 1995, Proceedings of 3rd International Conference on Document Analysis and Recognition.

[7]  F. Itakura,et al.  Balancing acoustic and linguistic probabilities , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[8]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[9]  Hermann Ney,et al.  Integrated Handwriting Recognition And Interpretation Using Finite-State Models , 2004, Int. J. Pattern Recognit. Artif. Intell..

[10]  José-Miguel Benedí,et al.  Learning of stochastic context-free grammars by means of estimation algorithms , 1999, EUROSPEECH.

[11]  Roger K. Moore Computer Speech and Language , 1986 .

[12]  Johansson. Stig,et al.  Manual of information to accompany the Lancaster-Oslo : Bergen Corpus of British English, for use with digital computers , 1978 .

[13]  Sargur N. Srihari,et al.  Off-Line Cursive Script Word Recognition , 1989, IEEE Trans. Pattern Anal. Mach. Intell..

[14]  R. Casey,et al.  Advances in Pattern Recognition , 1971 .

[15]  Joan-Andreu Sánchez,et al.  Estimation of the probability distributions of stochastic context-free grammars from the k-best derivations , 1998, ICSLP.

[16]  Steve Young,et al.  Applications of stochastic context-free grammars using the Inside-Outside algorithm , 1990 .

[17]  Sebastiano Impedovo,et al.  Automatic Bankcheck Processing: A New Engineered System , 1997, Int. J. Pattern Recognit. Artif. Intell..

[18]  Hermann Ney,et al.  Stochastic Grammars and Pattern Recognition , 1992 .

[19]  José-Miguel Benedí,et al.  Computation of the Probability of the Best Derivation of an Initial Substring from a Stochastic Context-Free Grammar , 1997 .

[20]  Jerome R. Bellegarda,et al.  A multispan language modeling framework for large vocabulary speech recognition , 1998, IEEE Trans. Speech Audio Process..

[21]  John D. Lafferty,et al.  Computation of the Probability of Initial Substring Generation by Stochastic Context-Free Grammars , 1991, Comput. Linguistics.

[22]  Lalit R. Bahl,et al.  A Maximum Likelihood Approach to Continuous Speech Recognition , 1983, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  F.,et al.  Learning of Stochastic Context-free Grammars from Bracketed Corpora by Means of Reestimation Algorithms , 1999 .

[24]  Francisco Casacuberta,et al.  Comparison Between the Inside-Outside Algorithm and the Viterbi Algorithm for Stochastic Context-Free Grammars , 1996, SSPR.

[25]  B. GATOS,et al.  Skew detection and text line position determination in digitized documents , 1997, Pattern Recognit..

[26]  Geoffrey Leech,et al.  Manual of Information for the Lancaster Parsed Corpus , 1999 .

[27]  Francisco Casacuberta,et al.  Offline Recognition of Syntax-Constrained Cursive Handwritten Text , 2000, SSPR/SPR.

[28]  Wayne H. Ward,et al.  A language model combining trigrams and stochastic context-free grammars , 1998, ICSLP.

[29]  Joan-Andreu Sánchez,et al.  Estimation of stochastic context-free grammars and their use as language models , 2005, Comput. Speech Lang..

[30]  Chinmoy B. Bose,et al.  Connected and degraded text recognition using hidden Markov model , 1994, Pattern Recognit..

[31]  Horst Bunke,et al.  The IAM-database: an English sentence database for offline handwriting recognition , 2002, International Journal on Document Analysis and Recognition.

[32]  Geoffrey Leech,et al.  The tagged LOB Corpus : user's manual , 1986 .