The use of letter patterns for script recognition

Addresses the problem of script recognition with ambiguous input from a pattern recognizer. A pattern recognizer produces a number of letter candidates for each letter position of the word it processes. These letter candidates combine to form a number (usually large, often very large) of letter string candidates for each input word of script that is written. The paper considers methods of using orthographic information letter patterns-to reduce this uncertainty. Letter string candidates may be rejected if they contain letter sequences which are not allowable in English (using n-grams), or are not real English words. The major problem with the use of n-grams in this way is that the list of allowable candidate strings remaining after look-up, are not necessarily words. Better reduction is given by comparing the candidate strings with a list of words, which can be obtained from a machine-readable dictionary. Those remaining allowable candidates can be ordered or ranked in accordance with their probability correct from the recognizer. Systems employing a lexical look-up in the past have found it difficult to hold a reasonably large vocabulary in memory, while being searchable in real time. Alternative data structures for representing large lists of words are discussed below in the paper. >