Polyglot Neural Language Models: A Case Study in Cross-Lingual Phonetic Representation Learning

We introduce polyglot language models, recurrent neural network models trained to predict symbol sequences in many different languages using shared representations of symbols and conditioning on typological information about the language to be predicted. We apply these to the problem of modeling phone sequences---a domain in which universal symbol inventories and cross-linguistically shared feature representations are a natural fit. Intrinsic evaluation on held-out perplexity, qualitative analysis of the learned representations, and extrinsic evaluation in two downstream applications that make use of phonetic features show (i) that polyglot models better generalize to held-out data than comparable monolingual models and (ii) that polyglot phonetic feature representations are of higher quality than those learned monolingually.

[1]  Andreas Stolcke,et al.  A study of multilingual speech recognition , 1997, EUROSPEECH.

[2]  Uri Tadmor,et al.  Loanwords in the World's Languages: A Comparative Handbook , 2009 .

[3]  Yulia Tsvetkov,et al.  Lexicon Stratification for Translating Out-of-Vocabulary Words , 2015, ACL.

[4]  Svenja Kranich,et al.  Language Contact , 2020, The Dutch Language in Japan (1600-1900).

[5]  Kishore Prahallad,et al.  The IIIT-H Indic Speech Databases , 2012, INTERSPEECH.

[6]  Ruslan Salakhutdinov,et al.  Unifying Visual-Semantic Embeddings with Multimodal Neural Language Models , 2014, ArXiv.

[7]  Manaal Faruqui,et al.  Improving Vector Space Word Representations Using Multilingual Correlation , 2014, EACL.

[8]  Ngoc Thang Vu,et al.  Combination of Recurrent Neural Networks and Factored Language Models for Code-Switching Language Modeling , 2013, ACL.

[9]  Chris Dyer,et al.  Bridge-Language Capitalization Inference in Western Iranian: Sorani, Kurmanji, Zazaki, and Tajik , 2016, LREC.

[10]  Noah A. Smith,et al.  Transition-Based Dependency Parsing with Stack Long Short-Term Memory , 2015, ACL.

[11]  Lukás Burget,et al.  Recurrent neural network based language model , 2010, INTERSPEECH.

[12]  Xin Wang,et al.  Predicting Polarities of Tweets by Composing Word Embeddings with Long Short-Term Memory , 2015, ACL.

[13]  Kevin Gimpel,et al.  Tailoring Continuous Word Representations for Dependency Parsing , 2014, ACL.

[14]  Kevin Gimpel,et al.  Deep Multilingual Correlation for Improved Word Embeddings , 2015, NAACL.

[15]  Alan W. Black,et al.  CLUSTERGEN: a statistical parametric synthesizer using trajectory modeling , 2006, INTERSPEECH.

[16]  Nikos D. Sidiropoulos,et al.  Translation Invariant Word Embeddings , 2015, EMNLP.

[17]  Zhizheng Wu,et al.  Sentence-level control vectors for deep neural network speech synthesis , 2015, INTERSPEECH.

[18]  Wang Ling,et al.  Finding Function in Form: Compositional Character Models for Open Vocabulary Word Representation , 2015, EMNLP.

[19]  Su-Youn Yoon,et al.  A Python Toolkit for Universal Transliteration , 2010, LREC.

[20]  Yulia Tsvetkov,et al.  Cross-Lingual Bridges with Models of Lexical Borrowing , 2016, J. Artif. Intell. Res..

[21]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[22]  Sarah L. Nesbeitt Ethnologue: Languages of the World , 1999 .

[23]  Jan Niehues,et al.  Wider Context by Using Bilingual Language Models in Machine Translation , 2011, WMT@EMNLP.

[24]  A. Waibel,et al.  Multilingual Speech Recognition , 1997 .

[25]  Noah A. Smith,et al.  Many Languages, One Parser , 2016, TACL.

[26]  Christopher Potts,et al.  Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank , 2013, EMNLP.

[27]  Chalapathy Neti,et al.  Towards speech understanding across multiple languages , 1998, ICSLP.

[28]  Alex Graves,et al.  Generating Sequences With Recurrent Neural Networks , 2013, ArXiv.

[29]  Taro Watanabe,et al.  Transition-based Neural Constituent Parsing , 2015, ACL.

[30]  Tanja Schultz,et al.  Synthesizer voice quality of new languages calibrated with mean mel cepstral distortion , 2008, SLTU.

[31]  Hermann Ney,et al.  LSTM Neural Networks for Language Modeling , 2012, INTERSPEECH.

[32]  Dietrich Klakow,et al.  Testing the correlation of word error rate and perplexity , 2002, Speech Commun..

[33]  Martin Haspelmath,et al.  Lexical borrowing : Concepts and issues , 2009 .

[34]  Yoshua Bengio,et al.  Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.

[35]  Yoshua Bengio,et al.  Word Representations: A Simple and General Method for Semi-Supervised Learning , 2010, ACL.

[36]  Angeliki Lazaridou,et al.  Fish Transporters and Miracle Homes: How Compositional Distributional Semantics can Help NP Parsing , 2013, EMNLP.

[37]  Tanja Schultz,et al.  Towards universal speech recognition , 2002, Proceedings. Fourth IEEE International Conference on Multimodal Interfaces.

[38]  Yi Hu,et al.  Evaluation of Objective Quality Measures for Speech Enhancement , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[39]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[40]  H. Soltau,et al.  Efficient handling of multilingual language models , 2003, 2003 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE Cat. No.03EX721).

[41]  Guillaume Lample,et al.  Evaluation of Word Vector Representations by Subspace Alignment , 2015, EMNLP.

[42]  Ruslan Salakhutdinov,et al.  Multimodal Neural Language Models , 2014, ICML.

[43]  Phil Blunsom,et al.  Multilingual Models for Compositional Distributed Semantics , 2014, ACL.

[44]  Wanxiang Che,et al.  Revisiting Embedding Features for Simple Semi-supervised Learning , 2014, EMNLP.