论文信息 - Combining Trigram and Winnow in Thai OCR Error Correction

Combining Trigram and Winnow in Thai OCR Error Correction

From specific characteristics of Thai, Thai OCR errors frequently depend on nearby characters. To capture this characteristic of Thai OCR errors more appropriately, we propose the idea of using the varied n-gram of the character confusion probability for scoring approximately matched words. The value of n depends on characteristics of each character. For languages which have no explicit word boundary, word boundary ambiguity has to be resolved before correcting errors. In this paper, a maximal matching algorithm is used instead of a more complicated word segmentation algorithm to reduce a time complexity problem. Finally, a hybrid method which combines a part-of-speech trigram model with Winnow algorithm is used to selected the most probable correction.

[1] N. Littlestone. Learning Quickly When Irrelevant Attributes Abound: A New Linear-Threshold Algorithm , 1987, 28th Annual Symposium on Foundations of Computer Science (sfcs 1987).

[2] Karen Kukich,et al. Techniques for automatically correcting words in text , 1992, CSUR.

[3] Andrew R. Golding,et al. A Bayesian Hybrid Method for Context-sensitive Spelling Correction , 1996, VLC@ACL.

[4] Xiang Tong,et al. A Statistical Approach to Automatic OCR Error Correction in Context , 1996, VLC@COLING.

[5] Kemal Oflazer,et al. Error-tolerant Finite-state Recognition with Applications to Morphological Analysis and Spelling Correction , 1995, CL.

[6] Dan Roth,et al. Applying Winnow to Context-Sensitive Spelling Correction , 1996, ICML.

[7] OflazerKemal. Error-tolerant finite-state recognition with applications to morphological analysis and spelling correction , 1996 .

[8] Masaaki Nagata. Context-Based Spelling Correction for Japanese OCR , 1996, COLING.

[9] Peter Ingels,et al. Connected Text Recognition Using Layered HMMs and Token Passing , 1996, ArXiv.

[10] Yves Schabes,et al. Combining Trigram-based and Feature-based Methods for Context-Sensitive Spelling Correction , 1996, ACL.

[11] Surapant Meknavin,et al. Feature-based Thai Word Segmentation , 1997 .

[12] Combining Trigram and Winnow in Thai OCR Error Correction , 1998, COLING-ACL.