Machine translation from Japanese and French to Vietnamese, the difference among language families

Although Vietnamese is spoken language of more than 90 million people in the world (in 2014), Vietnamese language is still considered as a low-resourced language. Vietnamese NLP still lacks of resources for text and speech processing, especially research on machine translation for Vietnamese is very rare. This paper presents our first attempt to collect and construct French-Vietnamese and Japanese-Vietnamese statistical machine translation systems. These two different languages, French and Japanese, are less focused in Vietnamese-related machine translation research. The differences between these two languages in comparison with Vietnamese can bring out interesting observations.

[1]  Dragos Stefan Munteanu,et al.  Extracting Parallel Sub-Sentential Fragments from Non-Parallel Corpora , 2006, ACL.

[2]  Takashi Ikeda,et al.  Translation of Adnominal Modification Structures in Japanese-Vietnamese Machine Translation , 2005 .

[3]  Graham Neubig,et al.  Travatar: A Forest-to-String Machine Translation Engine based on Tree Transducers , 2013, ACL.

[4]  Joakim Nivre,et al.  Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics , 2009 .

[5]  W. J. Hutchins Machine translation over fifty years , 2001 .

[6]  Adam Kilgarriff,et al.  Introduction to the Special Issue on the Web as Corpus , 2003, CL.

[7]  Tomoyosi Akiba,et al.  Statistical Machine Translation without Source-side Parallel Corpus Using Word Lattice and Phrase Extension , 2012, LREC.

[8]  Nguyen My Chau,et al.  日本語-ベトナム語機械翻訳における「N1のN2」の処理 , 2006 .

[9]  Yamamoto Kazuhide,et al.  Phrase-based Statistical Machine Translation via Chinese Characters with Small Parallel Corpora , 2011 .

[10]  Kenneth Ward Church,et al.  A Program for Aligning Sentences in Bilingual Corpora , 1993, CL.

[11]  Doan-Nguyen Hai,et al.  Generation of Vietnamese for French-Vietnamese and English-Vietnamese machine translation , 2001 .

[12]  Hai Doan-Nguyen,et al.  Generation of Vietnamese for French-Vietnamese and English-Vietnamese Machine Translation , 2001, EWNLG@ACL.

[13]  Thi Minh Huyen Nguyen Outils et ressources linguistiques pour l'alignement de textes multilingues français-vietnamiens , 2006 .

[14]  Laurent Besacier,et al.  A fully unsupervised approach for mining parallel data from comparable corpora , 2010, EAMT.

[15]  Holger Schwenk,et al.  On the Use of Comparable Corpora to Improve SMT performance , 2009, EACL.

[16]  Robert L. Mercer,et al.  The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[17]  Pascale Fung,et al.  Mining Very-Non-Parallel Corpora: Parallel Sentence and Lexicon Extraction via Bootstrapping and E , 2004, EMNLP.

[18]  Tomoki Toda,et al.  Ckylark: A More Robust PCFG-LA Parser , 2015, HLT-NAACL.

[19]  Noah A. Smith,et al.  The Web as a Parallel Corpus , 2003, CL.

[20]  Daniel Marcu,et al.  Statistical Phrase-Based Translation , 2003, NAACL.