论文信息 - NIST Open Machine Translation 2008 Evaluation: Stanford University's System Description - 字舞流文

NIST Open Machine Translation 2008 Evaluation: Stanford University's System Description

This document describes Stanford University’s first entry into a NIST MT evaluation. Our entry to the 2008 evaluation mainly focused on establishing a competent baseline with a phrase-based system similar to (Och and Ney, 2004; Koehn et al., 2007). In a three-week effort prior to the evaluation, our attention focused on scaling up our system to exploit nearly all Chinese-English parallel data permissible under the constrained track, incorporating competitive language models into the decoder using Gigaword and Google n-grams, evaluating Chinese word segmentation models, and incorporating a document classifier as a pre-processing stage to the decoder.

Christopher D. Manning | Jenny Rose Finkel | Michel Galley | Daniel Cer | Pi-Chuan Chang | Daniel Matthew Cer | Michel Galley | J. Finkel | Pi-Chuan Chang

[1] Hermann Ney,et al. HMM-Based Word Alignment in Statistical Translation , 1996, COLING.

[2] Lalit R. Bahl,et al. A Maximum Likelihood Approach to Continuous Speech Recognition , 1983, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3] D. Rubin,et al. Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[4] Lucian Vlad Lita,et al. tRuEcasIng , 2003, ACL.

[5] Andrew McCallum,et al. Chinese Segmentation and New Word Detection using Conditional Random Fields , 2004, COLING.

[6] Wolfgang Macherey,et al. Lattice-based Minimum Error Rate Training for Statistical Machine Translation , 2008, EMNLP.

[7] Pat Langley,et al. Editorial: On Machine Learning , 1986, Machine Learning.

[8] Franz Josef Och,et al. Minimum Error Rate Training in Statistical Machine Translation , 2003, ACL.

[9] Hermann Ney,et al. A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[10] Daniel Marcu,et al. Statistical Phrase-Based Translation , 2003, NAACL.

[11] Andreas Stolcke,et al. SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[12] NeyHermann,et al. A systematic comparison of various statistical alignment models , 2003 .

[13] Christopher D. Manning,et al. Incorporating Non-local Information into Information Extraction Systems by Gibbs Sampling , 2005, ACL.

[14] Franz Josef Och,et al. Statistical machine translation: from single word models to alignment templates , 2002 .

[15] Philipp Koehn,et al. Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[16] Andrew McCallum,et al. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[17] Robert L. Mercer,et al. The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[18] F ChenStanley,et al. An Empirical Study of Smoothing Techniques for Language Modeling , 1996, ACL.

[19] Hermann Ney,et al. The Alignment Template Approach to Statistical Machine Translation , 2004, CL.

[20] Chris Callison-Burch,et al. Open Source Toolkit for Statistical Machine Translation: Factored Translation Models and Lattice Decoding , 2006 .

[21] Mirella Lapata,et al. Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics , 1999, ACL 1999.