论文信息 - Transliteration by Sequence Labeling with Lattice Encodings and Reranking

Transliteration by Sequence Labeling with Lattice Encodings and Reranking

We consider the task of generating transliterated word forms. To allow for a wide range of interacting features, we use a conditional random field (CRF) sequence labeling model. We then present two innovations: a training objective that optimizes toward any of a set of possible correct labels (since more than one transliteration is often possible for a particular input), and a k-best reranking stage to incorporate nonlocal features. This paper presents results on the Arabic-English transliteration task of the NEWS 2012 workshop.

Noah A. Smith | Chris Dyer | Waleed Ammar

[1] Fernando Pereira,et al. Shallow Parsing with Conditional Random Fields , 2003, NAACL.

[2] Min Zhang,et al. Whitepaper of NEWS 2012 Shared Task on Machine Transliteration , 2011, NEWS@ACL.

[3] Chris Dyer,et al. Using a maximum entropy model to build segmentation lattices for MT , 2009, NAACL.

[4] Michael Collins,et al. Discriminative Reranking for Natural Language Parsing , 2000, CL.

[5] Quoc V. Le,et al. On optimization methods for deep learning , 2011, ICML.

[6] Sravana Reddy,et al. Substring-based Transliteration with Conditional Random Fields , 2009, NEWS@IJCNLP.

[7] Grzegorz Kondrak,et al. Applying Many-to-Many Alignments and Hidden Markov Models to Letter-to-Phoneme Conversion , 2007, NAACL.

[8] Andrew McCallum,et al. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[9] David Chiang,et al. Better k-best Parsing , 2005, IWPT.

[10] Prasad Pingali,et al. Statistical Transliteration for Cross Langauge Information Retrieval using HMM alignment and CRF , 2008, IJCNLP 2008.