Building a Chinese-English wordnet for translingual applications

A WordNet-like linguistic resource is useful, but difficult to construct. This article proposes a method to integrate five linguistic resources, including English/Chinese sense-tagged corpora, English/Chinese thesauruses, and a bilingual dictionary. Chinese words are mapped into WordNet. A Chinese WordNet and a Chinese-English WordNet are derived by following the structures of WordNet. Experiments with Chinese-English information retrieval are developed to evaluate the applicability of the Chinese-English WordNet. The best model achieves 0.1010 average precision, 69.23% of monolingual information retrieval. It also gains a 10.02% increase relative to a model that resolves translation ambiguity and target polysemy problems together.

[1]  Kenneth Ward Church,et al.  Parsing, Word Associations and Typical Predicate-Argument Relations , 1989, HLT.

[2]  Edward F. Kelly,et al.  Computer recognition of English word senses , 1975 .

[3]  W. Bruce Croft,et al.  Resolving ambiguity for cross-language information retrieval: a dictionary approach , 2001 .

[4]  Hsin-Hsi Chen,et al.  Construction of a Chinese-English WordNet and its application to CLIR , 2000, IRAL '00.

[5]  W. Bruce Croft,et al.  Resolving ambiguity for cross-language retrieval , 1998, SIGIR '98.

[6]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[7]  David Yarowsky,et al.  Word-Sense Disambiguation Using Statistical Models of Roget’s Categories Trained on Large Corpora , 2010, COLING.

[8]  Gina-Anne Levow,et al.  Building a Chinese-English mapping between verb concepts for multilingual applications , 2000, AMTA.

[9]  Takenobu Tokunaga,et al.  The Use of WordNet in Information Retrieval , 1998, WordNet@ACL/COLING.

[10]  Miguel E. Ruiz,et al.  CINDOR Conceptual Interlingua Document Retrieval: TREC-8 Evaluation , 1999, TREC.

[11]  Hsin-Hsi Chen,et al.  Cross-language information access to multilingual collections on the internet , 2000, J. Am. Soc. Inf. Sci..

[12]  Hsin-Hsi Chen,et al.  An NLP & IR approach to topic detection , 2002 .

[13]  Hsin-Hsi Chen,et al.  A Muitilingual News Summarizer , 2000, COLING.

[14]  SHIH,et al.  Named Entity Extraction for Information Retrieval , 2002 .

[15]  Hsin-Hsi Chen,et al.  Sense-Tagging Chinese Corpus , 2000, ACL 2000.

[16]  K. T. Lua,et al.  An Efficient Inductive Unsupervised Semantic Tagger , 1996, ArXiv.

[17]  James Allan,et al.  Topic detection and tracking: event-based information organization , 2002 .

[18]  Adam Kilgarriff,et al.  SENSEVAL: an exercise in evaluating world sense disambiguation programs , 1998, LREC.

[19]  Nancy Ide,et al.  Introduction to the Special Issue on Word Sense Disambiguation: The State of the Art , 1998, Comput. Linguistics.

[20]  Hsin-Hsi Chen,et al.  Resolving Translation Ambiguity and Target Polysemy in Cross-Language Information Retrieval , 1999, ACL.

[21]  Horacio Rodríguez,et al.  Using WordNet for Building WordNets , 1998, WordNet@ACL/COLING.

[22]  Piek Vossen Building a Multilingual Database with Wordnets for European Languages , 1998 .

[23]  Carlo Strapparava,et al.  Lexical Discrimination with the Italian Version of WordNet , 1997 .

[24]  Helmut Feldweg,et al.  GermaNet - a Lexical-Semantic Net for German , 1997 .

[25]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[26]  Adam Kilgarriff,et al.  English Senseval: Report and Results , 2000, LREC.

[27]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[28]  Hwee Tou Ng,et al.  Integrating Multiple Knowledge Sources to Disambiguate Word Sense: An Exemplar-Based Approach , 1996, ACL.

[29]  Christiane Fellbaum,et al.  Building Semantic Concordances , 1998 .