Error correction in a Chinese OCR test collection
暂无分享,去创建一个
This article proposes a technique for correcting Chinese OCR errors to support retrieval of scanned documents. The technique uses a completely automatic technique (no manually constructed lexicons or confusion resources) to identify both keywords and confusable terms. Improved retrieval effectiveness on a single term query experiment is demonstrated.
[1] Yuen-Hsien Tseng. Automatic cataloguing and searching for retrospective data by use of OCR text , 2001 .
[2] Yuen-Hsien Tseng,et al. Content-based retrieval for music collections , 1999, SIGIR '99.
[3] Yuen-Hsien Tseng,et al. Automatic thesaurus generation for Chinese documents , 2002, J. Assoc. Inf. Sci. Technol..
[4] Kazem Taghva,et al. The Effects of Noisy Data on Text Retrieval , 1994, J. Am. Soc. Inf. Sci..