论文信息 - Automatic extraction of bilingual word pairs from parallel corpora with various languages using learning for adjacent information

Automatic extraction of bilingual word pairs from parallel corpora with various languages using learning for adjacent information

This paper presents a learning method using adjacent information as the method to extract bilingual word pairs efficiently from parallel corpora with various languages for which language resources are insufficient. In our method, information about correspondence between source language words and target language words is acquired automatically using the word strings that adjoin bilingual word pairs. That acquired information is used to solve the ambiguity problem of correspondence between source language words and target language words in various bilingual sentence pairs. First, the system using our method automatically acquires templates as information that indicates correspondence between source language words and target language words. The templates are based on word strings that adjoin the bilingual word pairs. Moreover, the system using our method efficiently extracts bilingual word pairs from bilingual sentence pairs using the acquired templates. Evaluation experiments showed that the system using our method extracted bilingual word pairs from parallel corpora with five kinds of languages. Results show that the total extraction rate was 60.1p. The total extraction rate was better by 8.0 percentage points compared to that obtained using a system based only on the Dice coefficient without our method. Those results confirm the effectiveness of our method. © 2006 Wiley Periodicals, Inc. Syst Comp Jpn, 37(13): 40–53, 2006; Published online in Wiley InterScience (). DOI 10.1002sscj.20534

Hiroshi Echizen-ya | Kenji Araki | Yoshio Momouchi

[1] Vasileios Hatzivassiloglou,et al. Translating Collocations for Bilingual Lexicons: A Statistical Approach , 1996, CL.

[2] Hiroshi Echizen-ya,et al. Machine translation using recursive chain-link-type learning based on translation examples , 2004 .

[3] Hiroaki Saito,et al. Extracting Word Sequence Correspondences Based on Support Vector Machines. , 2003 .

[4] Martin Kay,et al. Text-Translation Alignment , 1993, Comput. Linguistics.

[5] Hinrich Schütze,et al. Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[6] Hiroyuki Kaji,et al. Extracting Word Correspondences from Bilingual Corpora Based on Word Co-occurrence Information , 1996, COLING.