A bilingual word alignment algorithm of Vietnamese-Chinese based on feature constraint

It is difficult to achieve auto-alignment between Vietnamese and Chinese, because their syntax and structure are quite different. In this case we present a novel method for the Vietnamese-Chinese word alignment which merges a variety of feature constraint models. In this article, an improved model based on the Vietnamese-Chinese progressive structure and offset features of word sequence is described. From this model which is trained by a log-linear model framework, and with parameters trained by the minimum error rate algorithm, the result of the Vietnamese-Chinese auto-alignment is obtained. The basic model of the experiments is IBM Model 3, and as experimental results suggest, this bilingual word alignment method for Vietnamese and Chinese performs well and precision, recall rates are increased by 28.57 and 25.02 %, AER is reduced by 14.25 %.

[1]  Jun Tang,et al.  Globalisation, Networks and Translation: A Chinese Perspective , 2009 .

[2]  Franz Josef Och,et al.  Minimum Error Rate Training in Statistical Machine Translation , 2003, ACL.

[3]  Hermann Ney,et al.  A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[4]  Yang Liu,et al.  Discriminative Word Alignment by Linear Modeling , 2010, CL.

[5]  Xizhao Wang,et al.  A New Approach to Classifier Fusion Based on Upper Integral , 2014, IEEE Transactions on Cybernetics.

[6]  Thomas L. Griffiths,et al.  Contextual Dependencies in Unsupervised Word Segmentation , 2006, ACL.

[7]  Wojciech Rytter,et al.  On the Maximal Number of Cubic Runs in a String , 2010, LATA.

[8]  Hermann Ney,et al.  Discriminative Training and Maximum Entropy Models for Statistical Machine Translation , 2002, ACL.

[9]  Robert L. Mercer,et al.  The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[10]  Chenhui Chu,et al.  Exploiting Shared Chinese Characters in Chinese Word Segmentation Optimization for Chinese-Japanese Machine Translation , 2012, EAMT.

[11]  Mark Steedman,et al.  Example Selection for Bootstrapping Statistical Parsers , 2003, NAACL.

[12]  Daniel Gildea,et al.  Improving the IBM Alignment Models Using Variational Bayes , 2012, ACL.

[13]  Chenhui Chu,et al.  Chinese-Japanese Machine Translation Exploiting Chinese Characters , 2013, ACM Trans. Asian Lang. Inf. Process..

[14]  Jianyi Guo,et al.  A Chinese expert disambiguation method based on semi-supervised graph clustering , 2015, Int. J. Mach. Learn. Cybern..

[15]  Dan Tufis,et al.  Combined Word Alignments , 2005, ParallelText@ACL.

[16]  Hô Tuòng Vinh,et al.  A Hybrid Approach to Word Segmentation of Vietnamese Texts , 2008, LATA.

[17]  George F. Foster,et al.  Batch Tuning Strategies for Statistical Machine Translation , 2012, NAACL.

[18]  Shicheng Dong,et al.  Machine translation of Japanese-Chinese for conditional sentences based on templates , 2012, Proceedings of 2012 International Conference on Measurement, Information and Control.

[19]  Yu-Lin He,et al.  Non-Naive Bayesian Classifiers for Classification Problems With Continuous Attributes , 2014, IEEE Transactions on Cybernetics.

[20]  Hong Phuong Le,et al.  A Maximum Entropy Approach to Sentence Boundary Detection of Vietnamese Texts , 2008 .

[21]  Phil Blunsom,et al.  Discriminative Word Alignment with Conditional Random Fields , 2006, ACL.