论文信息 - Structural Feature Selection For English-Korean Statistical Machine Translation

Structural Feature Selection For English-Korean Statistical Machine Translation

When aligning texts in very different languages such as Korean and English, structural features beyond word or phrase give useful information. In this paper, we present a method for selecting structural features of two languages, from which we construct a model that assigns the conditional probabilities to corresponding tag sequences in bilingual English-Korean corpora. For tag sequence mapping between two languages, we first define a structural feature function which represents statistical properties of empirical distribution of a set of training samples. The system, based on maximum entropy concept, selects only features that produce high increases in loglikelihood of training samples. These structurally mapped features are more informative knowledge for statistical machine translation between English and Korean. Also, the information can help to reduce the parameter space of statistical alignment by eliminating syntactically unlikely alignments.

Juntae Yoon | Mansuk Song | Seonho Kim

[1] Hermann Ney,et al. A DP based Search Using Monotone Alignments in Statistical Translation , 1997, ACL.

[2] Stanley F. Chen,et al. Aligning Sentences in Bilingual Corpora Using Lexical Information , 1993, ACL.

[3] Franz Josef Och,et al. Improving Statistical Natural Language Translation with Categories and Rules , 1998, ACL.

[4] Yuji Matsumoto,et al. Sructural Matching of Parallel Texts , 1993, ACL.

[5] Key-Sun Choi,et al. Bilingual Knowledge Acquisition from Korean-English Parallel Corpus Using Alignment , 1996, COLING.

[6] Martin Kay,et al. Text-Translation Alignment , 1993, Comput. Linguistics.

[7] Adam L. Berger,et al. A Maximum Entropy Approach to Natural Language Processing , 1996, CL.

[8] Vasileios Hatzivassiloglou,et al. Translating Collocations for Bilingual Lexicons: A Statistical Approach , 1996, CL.

[9] Dekai Wu,et al. A Polynomial-Time Algorithm for Statistical Machine Translation , 1996, ACL.

[10] Julian Kupiec,et al. An Algorithm for Finding Noun Phrase Correspondences in Bilingual Corpora , 1993, ACL.

[11] John D. Lafferty,et al. The Candide System for Machine Translation , 1994, HLT.