The use of a hidden Markov model (HMM) for language syntax to improve the performance of a text recognition algorithm is proposed. Syntactic constraints are described by the transition probabilities between word classes. The confusion between the feature string for a word and the various syntactic classes is also described probabilistically. A modification of the Viterbi algorithm is also proposed that finds a fixed number of sequences of syntactic classes for a given sentence that have the highest probabilities of occurrence, given the feature strings for the words. An experimental application of this approach is demonstrated with a word hypothesization algorithm that produces a number of guesses about the identity of each word in a running text. The use of first and second order transition probabilities is explored. Overall performance of between 65 and 80 percent reduction in the average number of words that can match a given image is achieved.<<ETX>>
[1]
H. Kucera,et al.
Computational analysis of present-day American English
,
1967
.
[2]
Roland Kuhn,et al.
Speech Recognition and the Frequency of Recently Used Words: A Modified Markov Model for Natural Language
,
1988,
COLING.
[3]
L. Rabiner,et al.
An introduction to hidden Markov models
,
1986,
IEEE ASSP Magazine.
[4]
Jonathan J. Hull.
Hypothesis Generation in a Computational Model for Visual Word Recognition
,
1986,
IEEE Expert.
[5]
Sargur N. Srihari,et al.
An Integrated Algorithm for Text Recognition: Comparison with a Cascaded Algorithm
,
1983,
IEEE Transactions on Pattern Analysis and Machine Intelligence.