论文信息 - Extraction of Lexical Translations from Non-Aligned Corpora

Extraction of Lexical Translations from Non-Aligned Corpora

A method for extracting lexical translations from non-aligned corpora is proposed to cope with the unavailability of large aligned corpus. The assumption that "translations of two co-occurring words in a source language also co-occur in the target language" is adopted and represented in the stochastic matrix formulation. The translation matrix provides the co-occurring information translated from the source into the target. This translated co-occurring information should resemble that of the original in the target when the ambiguity of the translational relation is resolved. An algorithm to obtain the best translation matrix is introduced. Some experiments were performed to evaluate the effectiveness of the ambiguity resolution and the refinement of the dictionary.

Kumiko Tanaka-Ishii | Hideya Iwasaki | Kumiko Tanaka-Ishii | H. Iwasaki

[1] Kenneth Ward Church,et al. Word Association Norms, Mutual Information, and Lexicography , 1989, ACL.

[2] Reinhard Rapp,et al. Identifying Word Translations in Non-Parallel Texts , 1995, ACL.

[3] J. Jenkins,et al. Word association norms , 1964 .

[4] Pascale Pung,et al. A Pattern Matching Method for Finding Noun and Proper Noun Translations from Noisy Parallel Corpora , 1995, ACL 1995.

[5] Kumiko Tanaka-Ishii,et al. Construction of a Bilingual Dictionary Intermediated by a Third Language , 1994, COLING.

[6] Pascale Fung,et al. A Pattern Matching Method for Finding Noun and Proper Noun Translations from Noisy Parallel Corpora , 1995, ACL.

[7] Ted Dunning,et al. Accurate Methods for the Statistics of Surprise and Coincidence , 1993, CL.

[8] Yuji Matsumoto,et al. Bilingual Text, Matching using Bilingual Dictionary and Statistics , 1994, COLING.

[9] Alon Itai,et al. Word Sense Disambiguation Using a Second Language Monolingual Corpus , 1994, CL.

[10] Robert L. Mercer,et al. The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.