BBN's Systems for the Chinese-English Sub-task of the NTCIR-10 PatentMT Evaluation

This paper describes the systems we developed at Raytheon BBN Technologies for the Chinese-English sub-task of the Patent Machine Translation Task (PatentMT) of the NTCIR10 workshop. Our systems were originally built for translating newswire articles and were subsequently adapted to address some special problems of patent documents in the NTCIR-9 PatentMT evaluation. We applied some of our recent advancements in translation to the patent domain and investigated a sentence-level language model adaptation approach to take advantage of the characteristics of patent documents. These approaches contributed substantially to the improvement of translation quality and our systems achieved the best results among all submissions across all of the evaluation types and evaluation metrics.

[1]  Philipp Koehn,et al.  A unified framework for phrase-based, hierarchical, and syntax-based statistical machine translation , 2009, IWSLT.

[2]  Jacob Devlin,et al.  Lexical Features for Statistical Machine Translation , 2009 .

[3]  Larry L. Peterson,et al.  Reasoning about naming systems , 1993, TOPL.

[4]  Yanjun Ma,et al.  MaTrEx: the DCU machine translation system for IWSLT 2007 , 2007, IWSLT.

[5]  Chris Callison-Burch,et al.  Open Source Toolkit for Statistical Machine Translation: Factored Translation Models and Lattice Decoding , 2006 .

[6]  Eiichiro Sumita,et al.  Overview of the Patent Machine Translation Task at the NTCIR-10 Workshop , 2011, NTCIR.

[7]  F ChenStanley,et al.  An Empirical Study of Smoothing Techniques for Language Modeling , 1996, ACL.

[8]  Noriko Kando,et al.  Overview of Patent Retrieval Task at NTCIR-5 , 2005, NTCIR.

[9]  Sayori Shimohata,et al.  Finding Translation Candidates from Patent Corpus , 2005, MTSUMMIT.

[10]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[11]  Spyridon Matsoukas,et al.  Trait-Based Hypothesis Selection For Machine Translation , 2012, HLT-NAACL.

[12]  Jinxi Xu,et al.  Evaluating a probabilistic model for cross-lingual information retrieval , 2001, SIGIR '01.

[13]  Jinxi Xu,et al.  String-to-Dependency Statistical Machine Translation , 2010, CL.

[14]  Kevin Knight,et al.  11,001 New Features for Statistical Machine Translation , 2009, NAACL.

[15]  Lukás Burget,et al.  Recurrent neural network based language model , 2010, INTERSPEECH.

[16]  Hermann Ney,et al.  A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[17]  Proceedings of the 10th NTCIR Conference on Evaluation of Information Access Technologies, NTCIR-10, National Center of Sciences, Tokyo, Japan, June 18-21, 2013 , 2013, NTCIR.

[18]  Richard M. Schwartz,et al.  Language and Translation Model Adaptation using Comparable Corpora , 2008, EMNLP.

[19]  Jinxi Xu,et al.  A New String-to-Dependency Machine Translation Algorithm with a Target Dependency Language Model , 2008, ACL.

[20]  EHARA Terumasa,et al.  Rule based machine translation combined with statistical post editor for Japanese to English patent translation , 2007, MTSUMMIT.

[21]  Yaohong Jin A hybrid-strategy method combining semantic analysis with rule-based MT for patent machine translation , 2010, Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering(NLPKE-2010).

[22]  NeyHermann,et al.  A systematic comparison of various statistical alignment models , 2003 .

[24]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[25]  Nitin Madnani,et al.  TER-Plus: paraphrase, semantic, and alignment enhancements to Translation Edit Rate , 2009, Machine Translation.

[26]  Key-Sun Choi,et al.  Patent document categorization based on semantic structural information , 2007, Inf. Process. Manag..

[27]  Dan Wang Chinese to English automatic patent machine translation at SIPO , 2009 .

[28]  Richard M. Schwartz,et al.  BBN System Description for WMT10 System Combination Task , 2010, WMT@ACL.