A Maximum Entropy/Minimum Divergence Translation Model

I present empirical comparisons between a linear combination of standard statistical language and translation models and an equivalent Maximum Entropy/Minimum Divergence (MEMD) model, using several different methods for automatic feature selection. The MEMD model significantly outperforms the standard model in test corpus perplexity, even though it has far fewer parameters.

[1]  Hermann Ney,et al.  An iterative, DP-based search algorithm for statistical machine translation , 1998, ICSLP.

[2]  Robert L. Mercer,et al.  The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[3]  Hermann Ney,et al.  A DP based Search Algorithm for Statistical Machine Translation , 1998, ACL.

[4]  F ChenStanley,et al.  An Empirical Study of Smoothing Techniques for Language Modeling , 1996, ACL.

[5]  Frederick Jelinek,et al.  Interpolated estimation of Markov source parameters from sparse data , 1980 .

[6]  Alexander H. Waibel,et al.  Fast decoding for statistical machine translation , 1998, ICSLP.

[7]  Konstantinos Koumpis,et al.  Proceedings of the 6th International Conference on Spoken Language Processing , 2000 .

[8]  Michel Simard,et al.  Using cognates to align sentences in bilingual corpora , 1993, TMI.

[9]  John D. Lafferty,et al.  Inducing Features of Random Fields , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[10]  John Cocke,et al.  A Statistical Approach to Machine Translation , 1990, CL.

[11]  Hermann Ney,et al.  Improved Alignment Models for Statistical Machine Translation , 1999, EMNLP.

[12]  Harry Printz Fast computation of maximum entropy / minimum divergence feature gain , 1998, ICSLP.

[13]  Ronald Rosenfeld,et al.  A maximum entropy approach to adaptive statistical language modelling , 1996, Comput. Speech Lang..

[14]  Adam L. Berger,et al.  A Maximum Entropy Approach to Natural Language Processing , 1996, CL.

[15]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[16]  George F. Foster Incorporating Position Information into a Maximum Entropy/Minimum Divergence Translation Model , 2000, CoNLL/LLL.