Machine Translation for Chinese-Spanish: Experimenting with online Statistical and Rule-Based Paradigms

Due to the overwhelming increment of information in multiple languages, Machine Translation has become an essential application in our lives. Proof of this are the continuous investments made by technology companies to develop new and improved translation systems. However, translation between distant language pairs such as Chinese and Spanish, which are commonly used in both business and daily life, have been seldom addressed from a research point of view. This project focuses on online translation between Chinese and Spanish. Initially, we present a brief introduction to the field of Machine Translation, and a quick overview to its history and its main approaches. After that, we introduce Statistical Machine Translation, which is the translation paradigm behind our online system. We explain the mathematical basis and the structure of the models. Likewise, we present the evaluation framework used to evaluate our system. Next, we describe how the online system has been built, which allows for translating either from a web-based interface or from two mobile applications (one for Android and one for iOS). Also, we explain how special methods for inputting Chinese and Latin characters are included in the web-based interface and the application. Then, we present the implementation details regarding our statistical translation system. We cover both the description of the corpora used to train the system and the quality assessment of the resulting translations. Finally, we briefly explore the paradigm of rule-based machine translation (also between Chinese and Spanish). We explain the theory of this translation system and we describe the construction of a toy system in order to illustrate how this kind of systems works. This last task is the basis for an open-source rule-based machine translation system that is being developed within the framework of the Google Summer of Code 2013.

[1]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[2]  Eiichiro Sumita,et al.  Method of Selecting Training Data to Build a Compact and Efficient Translation Model , 2008, IJCNLP.

[3]  Richard Sproat Multilingual Text-to-Speech Synthesis , 1997 .

[4]  Andy Way,et al.  Hybridity in MT. Experiments on the Europarl Corpus , 2006, EAMT.

[5]  EstimationPeter,et al.  The Mathematics of Machine Translation : Parameter , 2004 .

[6]  Dennis Longley,et al.  Dictionary of information technology (2nd ed.) , 1986 .

[7]  Dan I. Moldovan,et al.  Language Models and Reranking for Machine Translation , 2006, WMT@HLT-NAACL.

[8]  David Chiang,et al.  A Hierarchical Phrase-Based Model for Statistical Machine Translation , 2005, ACL.

[9]  Taro Watanabe,et al.  Example-based Machine Translation Based on Syntactic Transfer with Statistical Models , 2004, COLING.

[10]  Hermann Ney,et al.  A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[11]  Mark J. Huiskes,et al.  The MIR flickr retrieval evaluation , 2008, MIR '08.

[12]  Yang Liu,et al.  Log-Linear Models for Word Alignment , 2005, ACL.

[13]  Georges van Slype,et al.  Better translation for better communication : a survey of the translation market, present and future , 1983 .

[14]  Kay Ethier XML: Problem - Design - Solution , 2006 .

[15]  Marta R. Costa-jussà,et al.  Evaluating Indirect Strategies for Chinese - Spanish Statistical Machine Translation: Extended Abstract , 2012, IJCAI.

[16]  Stephen G. Kochan Programming in Objective-C , 1983 .

[17]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[18]  F. Sánchez-Martínez Using unsupervised corpus-based methods to build rule-based machine translation systems , 2011 .

[19]  Phil D. Green,et al.  Robust automatic speech recognition with missing and unreliable acoustic data , 2001, Speech Commun..

[20]  George Nagy,et al.  Optical character recognition: an illustrated guide to the frontier , 1999, Electronic Imaging.

[21]  Ian S. Graham The HTML SourceBook , 1995 .