利用雙語學術名詞庫抽取中英字詞互譯及詞義解歧 (Sense Extraction and Disambiguation for Chinese Words from Bilingual Terminology Bank) [In Chinese]

Using lexical semantic knowledge to solve natural language processing problems has been getting popular in recent years. Because semantic processing relies heavily on lexical semantic knowledge, the construction of lexical semantic databases has become urgent. WordNet is the most famous English semantic knowledge database at present; many researches of word sense disambiguation adopt it as a standard. Because of the success of WordNet, there is a trend to construct WordNet in different languages. In this paper, we propose a methodology for constructing Chinese WordNet by extracting information from a bilingual terminology bank. We developed an algorithm of word-to-word alignment to extract the English-Chinese translation-equivalent word pairs first. Then, the algorithm disambiguates word senses and maps Chinese word senses to WordNet synsets to achieve the goal. In the word-to-word alignment experiment, this alignment algorithm achieves the f-score of 98.4%. In the word sense disambiguation experiment, the extracted senses cover 36.89% of WordNet synsets and the accuracy of the three proposed disambiguation rules achieve the accuracies of 80%, 83% and 87%, respectively.

[1]  Joe F. Zhou,et al.  Proceedings of the 1999 Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora, : 21-22 June 1999, University of Maryland, College Park, MD, USA , 1999 .

[2]  Hermann Ney,et al.  HMM-Based Word Alignment in Statistical Translation , 1996, COLING.

[3]  Horacio Rodríguez,et al.  Combining Multiple Methods for the Automatic Construction of Multilingual WordNets , 1997, ArXiv.

[4]  Ming-Hong Bai,et al.  Sense Extraction and Disambiguation for Chinese Words from Bilingual Terminology Bank , 2006, ROCLING/IJCLCLP.

[5]  Phil Blunsom,et al.  Discriminative Word Alignment with Conditional Random Fields , 2006, ACL.

[6]  Jason S. Chang,et al.  Building A Chinese WordNet Via Class-Based Translation Model , 2003, ROCLING/IJCLCLP.

[7]  Robert L. Mercer,et al.  The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[8]  EstimationPeter,et al.  The Mathematics of Machine Translation : Parameter , 2004 .

[9]  Lluís Padró,et al.  Mapping Multilingual Hierarchies Using Relaxation Labeling , 1999, EMNLP.

[10]  Yoshua Bengio,et al.  Unsupervised Sense Disambiguation Using Bilingual Probabilistic Models , 2004, ACL.

[11]  Adam Kilgarriff,et al.  The Senseval-3 English lexical sample task , 2004, SENSEVAL@ACL.

[12]  Philip Resnik,et al.  An Unsupervised Method for Word Sense Tagging using Parallel Corpora , 2002, ACL.

[13]  Philip Resnik,et al.  Exploiting Hidden Meanings: Using Bilingual Text for Monolingual Annotation , 2004, CICLing.

[14]  Rada Mihalcea,et al.  A Method for Word Sense Disambiguation of Unrestricted Text , 1999, ACL.

[15]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[16]  Hermann Ney,et al.  Improved Statistical Alignment Models , 2000, ACL.