Extraction of Bilingual Terminology from a Multilingual Web-based Encyclopedia

With the demand for bilingual dictionaries covering domain-specific terminology, research in the field of automatic dictionary extraction has become popular. However, the accuracy and coverage of dictionaries created based on bilingual text corpora are often not sufficient for domain-specific terms. Therefore, we present an approach for extracting bilingual dictionaries from the link structure of Wikipedia, a huge scale encyclopedia that contains a vast number of links between articles in different languages. Our methods analyze not only these interlanguage links but extract even more translations from redirect page and link text information. In an experiment which we have interpreted in detail, we proved that the combination of redirect page and link text information achieves much better results than the traditional approach of extracting bilingual terminology from parallel corpora.

[1]  Asunción Gómez-Pérez,et al.  Multilingual Lexical Semantic Resources for Ontology Translation , 2006, LREC.

[2]  Takahiro Hara,et al.  Wikipedia Mining for an Association Web Thesaurus Construction , 2007, WISE.

[3]  Jörg Tiedemann,et al.  Using Syntactic Knowledge for QA , 2006, CLEF.

[4]  Masatoshi Yoshikawa,et al.  Bilingual Terminology Acquisition from Comparable Corpora and Phrasal Translation to Cross-Language Information Retrieval , 2003, ACL.

[5]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[6]  Sayori Shimohata,et al.  Finding Translation Candidates from Patent Corpus , 2005, MTSUMMIT.

[7]  Kyo Kageura,et al.  Automatic generation of Japanese–English bilingual thesauri based on bilingual corpora , 2006 .

[8]  Robert L. Mercer,et al.  The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[10]  Hermann Ney,et al.  HMM-Based Word Alignment in Statistical Translation , 1996, COLING.

[11]  I. Dan Melamed A Word-to-Word Model of Translational Equivalence , 1997, ACL.

[12]  Pascale Fung,et al.  A Technical Word- and Term-Translation Aid Using Noisy Parallel Corpora across Language Groups , 2004, Machine Translation.

[13]  I. Dan Melamed Empirical Methods for MT Lexicon Development , 1998, AMTA.

[14]  Jim Breen,et al.  JMdict: a Japanese-Multilingual Dictionary , 2004 .

[15]  Jörg Tiedemann,et al.  The University of Groningen at QA@CLEF 2006: Using Syntactic Knowledge for QA , 2006, CLEF.

[16]  Takahiro Hara,et al.  A Thesaurus Construction Method from Large ScaleWeb Dictionaries , 2007, 21st International Conference on Advanced Information Networking and Applications (AINA '07).

[17]  Hermann Ney,et al.  Improved Statistical Alignment Models , 2000, ACL.

[18]  Kyo Kageura,et al.  Automatic generation of Japanese-English bilingual thesauri based on bilingual corpora , 2006, J. Assoc. Inf. Sci. Technol..