TransLite - Development of Lightweight Machine Translation System based on Constraint Synchronous Grammar

Machine Translation (MT) has stimulated researchers to design different systems for handling the translation task. Typical Rule based MT paradigms involve a long chain of processes, including morphological analysis, part-of-speech tagging, sense disambiguation, parsing, transformation and generation. The development of such systems is not only big but also time consuming, and a careful design have to be specifically related to the languages, since they usually have different properties and grammar. In order to provide a quick way in realizing a translation system for a specific domain or controlled languages between any language pair, a lightweight system, PCT TransLite is proposed. It only relies on parsing Constraint Synchronous Grammar (CSG) in the whole translation process. CSG has the power to describe syntactic relationships between the source and target language simultaneously based on controlled constraints, and to model semantic information in the constituents as features for disambiguation. Since the source syntactic pattern is usually associated with more than one target, the one satisfying all the constraints defined with the rule determines the relationship between them. Moreover, CSG can express non-standard linguistic phenomena easily, including discontinuity and crossing relationships, and words that are vanished or should appear in the translation. The objective in the realization of PCT TransLite is three-fold: in developing simple systems for any language pair rapidly and testing their feasibility when proper rules are defined; in acquiring more rules through an intuitive interface so that they can be parsed by any Context Free Grammar parsing algorithms augmented with constraints and the inference of the target structure; in educating students, for better understanding the MT development and different problems in language analysis through the visualization of syntactic trees. Currently, PCT TransLite has been already used in constructing CSG rules for real MT systems, for education purposes, and got positive feedback.

[1]  Chris Callison-Burch,et al.  Demonstration of Joshua: An Open Source Toolkit for Parsing-based Machine Translation , 2009, ACL.

[2]  Alon Lavie,et al.  METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments , 2005, IEEvaluation@ACL.

[3]  Kevin McTait,et al.  Translation Pattern Extraction and Recombination for Example-Based Machine Translation , 2001 .

[4]  Francis M. Tyers,et al.  Apertium: a free/open-source platform for rule-based machine translation , 2011, Machine Translation.

[5]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[6]  Chris Callison-Burch,et al.  Open Source Toolkit for Statistical Machine Translation: Factored Translation Models and Lattice Decoding , 2006 .

[7]  Jonathan Slocum,et al.  The LRC Machine Translation System , 1985, Comput. Linguistics.

[8]  Fai Wong,et al.  Machine Translation Using Constraint-Based Synchronous Grammar , 2006 .

[9]  George R. Doddington,et al.  Automatic Evaluation of Machine Translation Quality Using N-gram Co-Occurrence Statistics , 2002 .

[10]  John Cocke,et al.  A Statistical Approach to Machine Translation , 1990, CL.

[11]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[12]  Masaru Tomita,et al.  An Efficient Augmented-Context-Free Parsing Algorithm , 1987, Comput. Linguistics.

[13]  James H. Martin,et al.  Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition, 2nd Edition , 2000, Prentice Hall series in artificial intelligence.

[14]  Miles Osborne,et al.  Statistical Machine Translation , 2010, Encyclopedia of Machine Learning and Data Mining.

[15]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[16]  Ralf D. Brown,et al.  Example-Based Machine Translation in the Pangloss System , 1996, COLING.