Stanford University’s Chinese-to-English Statistical Machine Translation System for the 2008 NIST Evaluation

This document describes Stanford University’s first entry into a NIST MT evaluation. Our entry to the 2008 evaluation mainly focused on establishing a competent baseline with a phrase-based system similar to (Och and Ney, 2004; Koehn et al., 2007). In a three-week effort prior to the evaluation, our attention focused on scaling up our system to exploit nearly all Chinese-English parallel data permissible under the constrained track, incorporating competitive language models into the decoder using Gigaword and Google n-grams, evaluating Chinese word segmentation models, and incorporating a document classifier as a pre-processing stage to the decoder. This document is organized as follows: in Section 2, we describe linguistic resources used for our submission. In Section 3, we present the four main components of our translation system, i.e., a phrasebased translation system, a Chinese word segmenter, a text categorizer, and a truecaser. Finally, we discuss our results in Section 4.

[1]  Lalit R. Bahl,et al.  A Maximum Likelihood Approach to Continuous Speech Recognition , 1983, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[3]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[4]  Daniel Marcu,et al.  Statistical Phrase-Based Translation , 2003, NAACL.

[5]  Christopher D. Manning,et al.  Incorporating Non-local Information into Information Extraction Systems by Gibbs Sampling , 2005, ACL.

[6]  Hermann Ney,et al.  The Alignment Template Approach to Statistical Machine Translation , 2004, CL.

[7]  Franz Josef Och,et al.  Statistical machine translation: from single word models to alignment templates , 2002 .

[8]  Stanley F. Chen,et al.  An Empirical Study of Smoothing Techniques for Language Modeling , 1996, ACL.

[9]  Lucian Vlad Lita,et al.  tRuEcasIng , 2003, ACL.

[10]  Andrew McCallum,et al.  Chinese Segmentation and New Word Detection using Conditional Random Fields , 2004, COLING.

[11]  Robert L. Mercer,et al.  The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[12]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[13]  Hermann Ney,et al.  HMM-Based Word Alignment in Statistical Translation , 1996, COLING.

[14]  Hermann Ney,et al.  A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[15]  Wolfgang Macherey,et al.  Lattice-based Minimum Error Rate Training for Statistical Machine Translation , 2008, EMNLP.

[16]  Pat Langley,et al.  Editorial: On Machine Learning , 1986, Machine Learning.

[17]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.