论文信息 - G2P Conversion of Proper Names Using Word Origin Information

G2P Conversion of Proper Names Using Word Origin Information

Motivated by the fact that the pronunciation of a name may be influenced by its language of origin, we present methods to improve pronunciation prediction of proper names using word origin information. We train grapheme-to-phoneme (G2P) models on language-specific data sets and interpolate the outputs. We perform experiments on US surnames, a data set where word origin variation occurs naturally. Our methods can be used with any G2P algorithm that outputs posterior probabilities of phoneme sequences for a given word.

Sravana Reddy | Sonjia Waxmonsky

[1] Hermann Ney,et al. Joint-sequence models for grapheme-to-phoneme conversion , 2008, Speech Commun..

[2] John Goldsmith,et al. Natural language processing for named entities with word-internal information , 2011 .

[3] Satoshi Sekine,et al. Latent Class Transliteration based on Source Language Origin , 2011, ACL.

[4] Nick Cremelie. Improving the recognition of foreign names and non-native speech by combining multiple grapheme-to-phoneme converters , 2001 .

[5] Sudeshna Sarkar,et al. Learning Multi Character Alignment Rules and Classification of Training Data for Transliteration , 2009, NEWS@IJCNLP.

[6] Pushpak Bhattacharyya,et al. Improving Transliteration Accuracy Using Word-Origin Detection and Lexicon Lookup , 2009, NEWS@IJCNLP.

[7] Ariadna Font Llitjós,et al. Improving Pronunciation Accuracy of Proper Names with Language Origin Classes , 2001 .

[8] Benoît Maison,et al. Using place name data to train language identification models , 2003, INTERSPEECH.

[9] Benoit Maison,et al. Pronunciation modeling for names of foreign origin , 2003, 2003 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE Cat. No.03EX721).

[10] Grzegorz Kondrak,et al. Language identification of names with SVMs , 2010, HLT-NAACL.