Description of the Chinese-to-Spanish Rule-Based Machine Translation System Developed Using a Hybrid Combination of Human Annotation and Statistical Techniques

Two of the most popular Machine Translation (MT) paradigms are rule based (RBMT) and corpus based, which include the statistical systems (SMT). When scarce parallel corpus is available, RBMT becomes particularly attractive. This is the case of the Chinese--Spanish language pair. This article presents the first RBMT system for Chinese to Spanish. We describe a hybrid method for constructing this system taking advantage of available resources such as parallel corpora that are used to extract dictionaries and lexical and structural transfer rules. The final system is freely available online and open source. Although performance lags behind standard SMT systems for an in-domain test set, the results show that the RBMT’s coverage is competitive and it outperforms the SMT system in an out-of-domain test set. This RBMT system is available to the general public, it can be further enhanced, and it opens up the possibility of creating future hybrid MT systems.

[1]  Hervé Bourlard,et al.  On the Use of Information Retrieval Measures for Speech Recognition Evaluation , 2004 .

[2]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[3]  Alessio Lomuscio,et al.  Interactions between Knowledge and Time in a First-Order Logic for Multi-Agent Systems: Completeness Results , 2012, J. Artif. Intell. Res..

[4]  Lluís Màrquez i Villodre,et al.  A Graphical Interface for MT Evaluation and Error Analysis , 2012, ACL.

[5]  Kepa Sarasola,et al.  An open-source shallow-transfer machine translation engine for the Romance languages of Spain , 2005, EAMT.

[6]  Marcello Federico,et al.  Fbk @ Iwslt-2008 , 2008, IWSLT.

[7]  Marta R. Costa-jussà,et al.  Study and correlation analysis of linguistic, perceptual, and automatic machine translation evaluations , 2012, J. Assoc. Inf. Sci. Technol..

[8]  Franz Josef Och,et al.  Minimum Error Rate Training in Statistical Machine Translation , 2003, ACL.

[9]  Chin-Yew Lin,et al.  Automatic Evaluation of Machine Translation Quality Using Longest Common Subsequence and Skip-Bigram Statistics , 2004, ACL.


[11]  Hermann Ney,et al.  The Alignment Template Approach to Statistical Machine Translation , 2004, CL.

[12]  Robert Dale,et al.  United Nations General Assembly Resolutions : a six-language parallel corpus , 2009 .

[13]  Hermann Ney,et al.  A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[14]  Marta R. Costa-jussà,et al.  Study and Comparison of Rule-Based and Statistical Catalan-Spanish Machine Translation Systems , 2012, Comput. Informatics.

[15]  Harold L. Somers,et al.  An introduction to machine translation , 1992 .

[16]  Francis M. Tyers,et al.  Free/Open Source Shallow-Transfer Based Machine Translation for Spanish and Aragonese , 2012, LREC.

[17]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[18]  Mikel L. Forcada,et al.  Open-Source Portuguese-Spanish Machine Translation , 2006, PROPOR.

[19]  Christopher D. Manning,et al.  Optimizing Chinese Word Segmentation for Machine Translation Performance , 2008, WMT@ACL.

[20]  Francis M. Tyers,et al.  Apertium-IceNLP: A rule-based Icelandic to English machine translation system , 2011, EAMT.

[21]  Víctor M. Sánchez-Cartagena,et al.  Building machine translation systems for language pairs with scarce resources , 2015 .

[22]  Alon Lavie,et al.  METEOR: An Automatic Metric for MT Evaluation with High Levels of Correlation with Human Judgments , 2007, WMT@ACL.

[23]  Jerry White,et al.  for published , 1999 .

[24]  Francis M. Tyers Feasible lexical selection for rule-based machine translation , 2014 .

[25]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[26]  Mikel L. Forcada,et al.  Inferring Shallow-Transfer Machine Translation Rules from Small Parallel Corpora , 2014, J. Artif. Intell. Res..

[27]  Francis M. Tyers,et al.  Apertium: a free/open-source platform for rule-based machine translation , 2011, Machine Translation.

[28]  Daniel Marcu,et al.  Statistical Phrase-Based Translation , 2003, NAACL.

[29]  Ping Li,et al.  Lexical ambiguity resolution in Chinese sentence processing , 2006 .

[30]  Marta R. Costa-jussà,et al.  Evaluating Indirect Strategies for Chinese - Spanish Statistical Machine Translation: Extended Abstract , 2012, IJCAI.

[31]  Gorka Labaka,et al.  Matxin, an open-source rule-based machine translation system for Basque , 2011, Machine Translation.