论文信息 - Automatic thesaurus generation for Chinese documents

Automatic thesaurus generation for Chinese documents

This article reports an approach to automatic thesaurus construction for Chinese documents. An effective Chinese keyword extraction algorithm is first presented. Experiments showed that for each document an average of 33% keywords unknown to a lexicon of 123,226 terms could be identified by this algorithm. Of these unregistered words, only 8.3% of them are illegal. Keywords extracted from each document are further filtered for term association analysis. Association weights larger than a threshold are then accumulated over all the documents to yield the final term pair similarities. Compared to previous studies, this method speeds up the thesaurus generation process drastically, It also achieves a similar percentage level of term relatedness.

Yuen-Hsien Tseng | Yuen-Hsien Tseng

[1] Gerard Salton,et al. Experiments in Automatic Thesaurus Construction for Information Retrieval , 1971, IFIP Congress.

[2] Hsinchun Chen,et al. A Parallel Computing Approach to Creating Engineering Concept Spaces for Semantic Retrieval: The Illinois Digital Library Initiative Project , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[3] Ido Dagan,et al. Mining Text Using Keyword Distributions , 1998, Journal of Intelligent Information Systems.

[4] Hsin-Hsi Chen,et al. Identification and Classification of Proper Nouns in Chinese Texts , 1996, COLING.

[5] Yuen-Hsien Tseng,et al. Content-based retrieval for music collections , 1999, SIGIR '99.

[6] Hsin-Hsi Chen,et al. Construction of a Chinese-English WordNet and its application to CLIR , 2000, IRAL '00.

[7] Yuen-Hsien Tseng. Multilingual keyword extraction for term suggestion , 1998, SIGIR '98.

[8] Key-Sun Choi,et al. Automatic thesaurus construction using Bayesian networks , 1995, CIKM '95.

[9] Carolyn J. Crouch,et al. Experiments in automatic statistical thesaurus construction , 1992, SIGIR '92.

[10] Hsinchun Chen,et al. Automatic Thesaurus Generation for an Electronic Community System , 1995, J. Am. Soc. Inf. Sci..

[11] Gerard Salton,et al. Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer , 1989 .