Semantic Information Extraction with a Thesaurus for Visual Word Recognition
暂无分享,去创建一个
The use of a structural representation for semantic information to improve the performance of a text recognition algorithm is proposed. Semantic constraints are modeled by the graph of connections in a thesaurus, where the thesaurus consists of root words that point to lists of related words. The alternatives produced by a word hypothesization routine are looked up in the thesauruS and activation scores for related words are incremented. In a second pass, the neighborhoods for each word are sorted by the activation scores and a threshold is applied. Since a passage of text is usually about a single topic, the activation scores provide a method of grouping semantically related alternatives. An experimental application of this approach is demonstrated with a word hypothesization algorithm that produces a number of guesses about the identity of each word in a running text. The word recognition alternatives for nouns are used in two recursive thesaurus lookups. The resulting activation values are used to thresho!f,L_ ... _ ._.!!1~_J1Jlm~cofalternatives-that-can·match-eachwor£t-·InsshOwnffiifa~reduction of 12 to 20 percent in average neighborhood size can be achieved with a one percent error rate.
[1] Sargur N. Srihari,et al. Combination of Structural Classifiers , 1990 .
[2] Ken Thompson,et al. Reading Chess , 1990, IEEE Trans. Pattern Anal. Mach. Intell..
[3] Jonathan J. Hull. Hypothesis Generation in a Computational Model for Visual Word Recognition , 1986, IEEE Expert.
[4] Jonathan Hull,et al. COMPUTATIONAL APPROACH TO VISUAL WORD RECOGNITION: HYPOTHESIS GENERATION AND TESTING. , 1986 .
[5] H. Kucera,et al. Computational analysis of present-day American English , 1967 .