Transliteration in Any Language with Surrogate Languages

We introduce a method for transliteration generation that can produce transliterations in every language. Where previous results are only as multilingual as Wikipedia, we show how to use training data from Wikipedia as surrogate training for any language. Thus, the problem becomes one of ranking Wikipedia languages in order of suitability with respect to a target language. We introduce several task-specific methods for ranking languages, and show that our approach is comparable to the oracle ceiling, and even outperforms it in some cases.

[1]  Pushpak Bhattacharyya,et al.  Compositional Machine Transliteration , 2010, TALIP.

[2]  Kevin Knight,et al.  Machine Transliteration , 1997, CL.

[3]  Jaana Kekäläinen,et al.  Cumulated gain-based evaluation of IR techniques , 2002, TOIS.

[4]  Dan Roth,et al.  Learning better transliterations , 2009, CIKM.

[5]  Slav Petrov,et al.  Multi-Source Transfer of Delexicalized Dependency Parsers , 2011, EMNLP.

[6]  Chris Callison-Burch,et al.  Processing Informal, Romanized Pakistani Text Messages , 2012 .

[7]  Kristina Toutanova,et al.  Multilingual Named Entity Recognition using Parallel Data and Metadata from Wikipedia , 2012, ACL.

[8]  L. R. Moscovice Max Planck Institute for Evolutionary Anthropology, Department of Primatology , 2017 .

[9]  Rudolf Rosa,et al.  KLcpos3 - a Language Similarity Measure for Delexicalized Parser Transfer , 2015, ACL.

[10]  Kareem Darwish,et al.  Named Entity Recognition using Cross-lingual Resources: Arabic as an Example , 2013, ACL.

[11]  Jakob Uszkoreit,et al.  Cross-lingual Word Clusters for Direct Transfer of Linguistic Structure , 2012, NAACL.

[12]  Hal Daumé,et al.  Regularized Interlingual Projections: Evaluation on Multilingual Transliteration , 2012, EMNLP-CoNLL.

[13]  Chris Callison-Burch,et al.  Transliterating From All Languages , 2010, AMTA.

[14]  Su-Youn Yoon,et al.  Multilingual Transliteration Using Feature based Phonetic Method , 2007, ACL.

[15]  Thorsten Joachims,et al.  Training linear SVMs in linear time , 2006, KDD '06.

[16]  Grzegorz Kondrak,et al.  Substring-Based Transliteration , 2007, ACL.

[17]  Falk Scholer,et al.  Machine transliteration survey , 2011, ACM Comput. Surv..