Cost-benefit Analysis of Two-Stage Conditional Random Fields based English-to-Chinese Machine Transliteration

This work presents an English-to-Chinese (E2C) machine transliteration system based on two-stage conditional random fields (CRF) models with accessor variety (AV) as an additional feature to approximate local context of the source language. Experiment results show that two-stage CRF method outperforms the one-stage opponent since the former costs less to encode more features and finer grained labels than the latter.

[1]  François Yvon,et al.  Practical Very Large Scale CRFs , 2010, ACL.

[2]  Ying Qin,et al.  Forward-backward Machine Transliteration between English and Chinese Based on Combined CRFs , 2011, NEWS@IJCNLP.

[3]  Dong Yang,et al.  Jointly Optimizing a Two-Step Conditional Random Field Model for Machine Transliteration and Its Fast Decoding Algorithm , 2010, ACL.

[4]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[5]  Sanjeev Khudanpur,et al.  Transliteration of Proper Names in Cross-Lingual Information Retrieval , 2003, NER@ACL.

[6]  Xiaotie Deng,et al.  Accessor Variety Criteria for Chinese Word Extraction , 2004, CL.

[7]  Peter N. Yianilos,et al.  Learning String-Edit Distance , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[8]  Haizhou Li,et al.  Report of NEWS 2010 Transliteration Generation Shared Task , 2010, NEWS@ACL.

[9]  Hermann Ney,et al.  A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[10]  Hai Zhao,et al.  Unsupervised Segmentation Helps Supervised Learning of Character Tagging for Word Segmentation and Named Entity Recognition , 2008, IJCNLP.

[11]  Grzegorz Kondrak,et al.  Applying Many-to-Many Alignments and Hidden Markov Models to Letter-to-Phoneme Conversion , 2007, NAACL.

[12]  Key-Sun Choi,et al.  An ensemble of transliteration models for information retrieval , 2006, Inf. Process. Manag..

[13]  Zellig S. Harris,et al.  Morpheme Boundaries within Words: Report on a Computer Test , 1970 .

[14]  Jian Su,et al.  A Joint Source-Channel Model for Machine Transliteration , 2004, ACL.

[15]  Sravana Reddy,et al.  Substring-based Transliteration with Conditional Random Fields , 2009, NEWS@IJCNLP.

[16]  Vasudeva Varma,et al.  A Language-Independent Transliteration Schema Using Character Aligned Models at NEWS 2009 , 2009, NEWS@IJCNLP.

[17]  Wen-Lian Hsu,et al.  English-to-Chinese Machine Transliteration using Accessor Variety Features of Source Graphemes , 2011, NEWS@IJCNLP.

[18]  Dong Yang,et al.  Combining a Two-step Conditional Random Field Model and a Joint Source Channel Model for Machine Transliteration , 2009, NEWS@IJCNLP.

[19]  Ying Qin Phoneme strings based machine transliteration , 2011, 2011 7th International Conference on Natural Language Processing and Knowledge Engineering.

[20]  Kevin Knight,et al.  Machine Transliteration , 1997, CL.