On integrating hybrid and rule-based components for patent MT with several levels of output

We present a methodology integrating hybrid and rule-based components for speeding up the development of a patent MT system. The methodology is suitable for highly inflecting languages and described on the example of translating patent claims from Russian into English. Based on different combinations of hybrid and rule-based components the system performs shallow or/and deep parsing and provides for several complementary levels of output, (i) translation of terminology, that only involves shallow MT procedures, and (ii) full translation that is based on both shallow and deep parsing integrated either automatically, or in an interactive environment. Full translation of the patent claim is output in two formats, a legal one sentence format and a better readable set of simple sentences. To control the quality of claim translation by better understanding the input, the system also outputs a SL claim decomposed into simple sentences.

[1]  S. Sheremetyeva Natural Language Analysis of Patent Claims , 2003, ACL 2003.

[2]  Makoto Iwayama,et al.  Patent Claim Processing for Readability - Structure Analysis and Term Explanation , 2003, ACL 2003.

[3]  Serge Sharoff,et al.  What is at Stake: a Case Study of Russian Expressions Starting with a Preposition , 2004 .

[4]  Sayori Shimohata,et al.  Finding Translation Candidates from Patent Corpus , 2005, MTSUMMIT.

[5]  Christoph Neumann A Human-Aided Machine Translation System for Japanese-English Patent Translation , 2005, MTSUMMIT.

[6]  Munpyo Hong,et al.  Customizing a Korean-English MT System for Patent Translation , 2005, MTSUMMIT.

[7]  Elliott Macklovitch TransType2 : The Last Word , 2006, LREC.

[8]  Svetlana Sheremetyeva,et al.  On portability of resources for a quick ramp up of multilingual MT of patent claims , 2007, MTSUMMIT.

[9]  Andreas Eisele,et al.  Using Moses to Integrate Multiple Rule-Based Machine Translation Engines into a Hybrid System , 2008, WMT@ACL.

[10]  Svetlana Sheremetyeva On Extracting Multiword NP Terminology for MT , 2009, EAMT.

[11]  Bruno Pouliquen,et al.  Tapta: A user-driven translation system for patent documents based on domain-aware Statistical Machine Translation , 2011, EAMT.

[12]  Terumasa Ehara Machine translation system for patent documents combining rule-based translation and statistical post-editing applied to the PatentMT Task , 2011, NTCIR.

[13]  Yaohong Jin,et al.  A new Chinese-English machine translation method based on rule for claims sentence of Chinese patent , 2011, 2011 7th International Conference on Natural Language Processing and Knowledge Engineering.

[14]  Andy Way,et al.  Experiments on Domain Adaptation for Patent Machine Translation in the PLuTO project , 2011, EAMT.