论文信息 - The HDU Discriminative SMT System for Constrained Data PatentMT at NTCIR10 - 字舞流文

The HDU Discriminative SMT System for Constrained Data PatentMT at NTCIR10

We describe the statistical machine translation (SMT) systems developed at Heidelberg University for the Chinese-toEnglish and Japanese-to-English PatentMT subtasks at the NTCIR10 workshop. The core system used in both subtasks is a combination of hierarchical phrase-based translation and discriminative training using either large feature sets and ‘1=‘2 regularization (for Japanese-to-English) or variants of soft syntactic constraints (for Chinese-to-English). Our goal is to address the twofold nature of patents by exploiting the repetitive nature of patents through feature sharing in a multi-task learning setup (used in the Japaneseto-English translation subtask), and by countersteering complex word order dierences with syntactic features (used in

Stefan Riezler | Artem Sokolov | Katharina Wäschle | Patrick Simianer | Laura Jehl | Gesa Stupperich | S. Riezler | K. Wäschle | P. Simianer | Laura Jehl | Artem Sokolov | Gesa Stupperich

[1] Stefan Riezler,et al. Analyzing Parallelism and Domain Similarities in the MAREC Patent Corpus , 2012, IRFC.

[2] S. T. Buckland,et al. Computer-Intensive Methods for Testing Hypotheses. , 1990 .

[3] Vladimir Eidelman,et al. cdec: A Decoder, Alignment, and Learning Framework for Finite- State and Context-Free Translation Models , 2010, ACL.

[4] Eiichiro Sumita,et al. Overview of the Patent Machine Translation Task at the NTCIR-10 Workshop , 2011, NTCIR.

[5] Phil Blunsom,et al. Probabilistic Inference for Machine Translation , 2008, EMNLP.

[6] Philip Resnik,et al. Soft Syntactic Constraints for Hierarchical Phrased-Based Translation , 2008, ACL.

[7] Hermann Ney,et al. Analysing soft syntax features and heuristics for hierarchical phrase based machine translation. , 2008, IWSLT.

[8] Alon Lavie,et al. Better Hypothesis Testing for Statistical Machine Translation: Controlling for Optimizer Instability , 2011, ACL.

[9] Franz Josef Och,et al. Minimum Error Rate Training in Statistical Machine Translation , 2003, ACL.

[10] Philipp Koehn,et al. Empirical Methods for Compound Splitting , 2003, EACL.

[11] Salim Roukos,et al. Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[12] Adam Lopez,et al. Hierarchical Phrase-Based Translation with Suffix Arrays , 2007, EMNLP.

[13] Shankar Kumar,et al. Efficient Minimum Error Rate Training and Minimum Bayes-Risk Decoding for Translation Hypergraphs and Lattices , 2009, ACL/IJCNLP.

[14] Mark Hopkins,et al. Tuning as Ranking , 2011, EMNLP.

[15] Anoop Sarkar,et al. Discriminative Reranking for Machine Translation , 2004, NAACL.

[16] Taro Watanabe,et al. NTT statistical machine translation for IWSLT 2006 , 2006, IWSLT.

[17] Markus Freitag,et al. The RWTH Aachen System for NTCIR-10 PatentMT , 2013, NTCIR.

[18] Stefan Riezler,et al. On Some Pitfalls in Automatic Evaluation and Significance Testing for MT , 2005, IEEvaluation@ACL.

[19] David Chiang,et al. Hierarchical Phrase-Based Translation , 2007, CL.

[20] Chris Dyer,et al. Joint Feature Selection in Distributed Stochastic Learning for Large-Scale Discriminative Training in SMT , 2012, ACL.

[21] David Chiang,et al. A Hierarchical Phrase-Based Model for Statistical Machine Translation , 2005, ACL.

[22] K. J. Evans,et al. Computer Intensive Methods for Testing Hypotheses: An Introduction , 1990 .

[23] Spyridon Matsoukas,et al. BBN's Systems for the Chinese-English Sub-task of the NTCIR-10 PatentMT Evaluation , 2013, NTCIR.