论文信息 - Error correction in a Chinese OCR test collection

Error correction in a Chinese OCR test collection

This article proposes a technique for correcting Chinese OCR errors to support retrieval of scanned documents. The technique uses a completely automatic technique (no manually constructed lexicons or confusion resources) to identify both keywords and confusable terms. Improved retrieval effectiveness on a single term query experiment is demonstrated.

Yuen-Hsien Tseng

[1] Yuen-Hsien Tseng. Automatic cataloguing and searching for retrospective data by use of OCR text , 2001 .

[2] Yuen-Hsien Tseng,et al. Content-based retrieval for music collections , 1999, SIGIR '99.

[3] Yuen-Hsien Tseng,et al. Automatic thesaurus generation for Chinese documents , 2002, J. Assoc. Inf. Sci. Technol..

[4] Kazem Taghva,et al. The Effects of Noisy Data on Text Retrieval , 1994, J. Am. Soc. Inf. Sci..