Modeling Code-Switch Languages Using Bilingual Parallel Corpus

Language modeling is the technique to estimate the probability of a sequence of words. A bilingual language model is expected to model the sequential dependency for words across languages, which is difficult due to the inherent lack of suitable training data as well as diverse syntactic structure across languages. We propose a bilingual attention language model (BALM) that simultaneously performs language modeling objective with a quasi-translation objective to model both the monolingual as well as the cross-lingual sequential dependency. The attention mechanism learns the bilingual context from a parallel corpus. BALM achieves state-of-the-art performance on the SEAME code-switch database by reducing the perplexity of 20.5% over the best-reported result. We also apply BALM in bilingual lexicon induction, and language normalization tasks to validate the idea.

[1]  Eneko Agirre,et al.  A robust self-learning method for fully unsupervised cross-lingual mappings of word embeddings , 2018, ACL.

[2]  Manaal Faruqui,et al.  Improving Vector Space Word Representations Using Multilingual Correlation , 2014, EACL.

[3]  David Sankoff,et al.  The production of code-mixed discourse , 2002, COLING.

[4]  Pascale Fung,et al.  Code-Switching Language Modeling using Syntax-Aware Multi-Task Learning , 2018, CodeSwitch@ACL.

[5]  Haizhou Li,et al.  A review of the mandarin-english code-switching corpus: SEAME , 2017, 2017 International Conference on Asian Language Processing (IALP).

[6]  David Sankofl,et al.  The Production of Code-Mixed Discourse , 1998, ACL 1998.

[7]  Almeida Jacqueline Toribio,et al.  Code Switching and X-Bar Theory : The Functional Head Constraint , 2008 .

[8]  Omer Levy,et al.  A Strong Baseline for Learning Cross-Lingual Word Embeddings from Sentence Alignments , 2016, EACL.

[9]  Hermann Ney,et al.  Language Modeling with Deep Transformers , 2019, INTERSPEECH.

[10]  Marie-Francine Moens,et al.  Bilingual Distributed Word Representations from Document-Aligned Comparable Data , 2015, J. Artif. Intell. Res..

[11]  Jörg Tiedemann,et al.  OpenSubtitles2016: Extracting Large Parallel Corpora from Movie and TV Subtitles , 2016, LREC.

[12]  Nikos D. Sidiropoulos,et al.  Translation Invariant Word Embeddings , 2015, EMNLP.

[13]  Haizhou Li,et al.  Word and Class Common Space Embedding for Code-switch Language Modelling , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[14]  Ewan Klein,et al.  Natural Language Processing with Python , 2009 .

[15]  Tomas Mikolov,et al.  Enriching Word Vectors with Subword Information , 2016, TACL.

[16]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[17]  Ying Li,et al.  Improved mixed language speech recognition using asymmetric acoustic model and language model with code-switch inversion constraints , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[18]  Goran Glavas,et al.  How to (Properly) Evaluate Cross-Lingual Word Embeddings: On Strong Baselines, Comparative Analyses, and Some Misconceptions , 2019, ACL.

[19]  Almeida Jacqueline Toribio,et al.  Code switching and X-bar theory: the fuctional head constraint , 1994 .

[20]  Guillaume Lample,et al.  Cross-lingual Language Model Pretraining , 2019, NeurIPS.

[21]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[22]  Phil Blunsom,et al.  Multilingual Models for Compositional Distributed Semantics , 2014, ACL.

[23]  Pascale Fung,et al.  Code-Switched Language Models Using Neural Based Synthetic Data from Parallel Sentences , 2019, CoNLL.

[24]  Pascale Fung,et al.  Language Modeling with Functional Head Constraint for Code Switching Speech Recognition , 2014, EMNLP.

[25]  Monojit Choudhury,et al.  Language Modeling for Code-Mixing: The Role of Linguistic Theory based Synthetic Data , 2018, ACL.

[26]  Quoc V. Le,et al.  Exploiting Similarities among Languages for Machine Translation , 2013, ArXiv.

[27]  Carol Myers-Scotton,et al.  Duelling Languages: Grammatical Structure in Codeswitching , 1993 .

[28]  Yiming Yang,et al.  Transformer-XL: Attentive Language Models beyond a Fixed-Length Context , 2019, ACL.

[29]  Ilya Sutskever,et al.  Language Models are Unsupervised Multitask Learners , 2019 .

[30]  Samuel L. Smith,et al.  Offline bilingual word vectors, orthogonal transformations and the inverted softmax , 2017, ICLR.

[31]  Haizhou Li,et al.  Recurrent neural network language modeling for code switching conversational speech , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[32]  Noah A. Smith,et al.  A Simple, Fast, and Effective Reparameterization of IBM Model 2 , 2013, NAACL.

[33]  Haizhou Li,et al.  Linguistically Motivated Parallel Data Augmentation for Code-Switch Language Modeling , 2019, INTERSPEECH.

[34]  Eneko Agirre,et al.  Learning bilingual word embeddings with (almost) no bilingual data , 2017, ACL.

[35]  Jörg Tiedemann,et al.  Parallel Data, Tools and Interfaces in OPUS , 2012, LREC.

[36]  Hermann Ney,et al.  A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[37]  Shana Poplack,et al.  Sometimes I'll Start a Sentence in Spanish Y Termino En Espanol: toward a Typology of Code-switching 1 , 2010 .

[38]  Christopher D. Manning,et al.  Bilingual Word Representations with Monolingual Quality in Mind , 2015, VS@HLT-NAACL.

[39]  Regina Barzilay,et al.  Ten Pairs to Tag – Multilingual POS Tagging via Coarse Mapping between Embeddings , 2016, NAACL.

[40]  Christopher D. Manning,et al.  Optimizing Chinese Word Segmentation for Machine Translation Performance , 2008, WMT@ACL.

[41]  Ngoc Thang Vu,et al.  Combining recurrent neural networks and factored language models during decoding of code-Switching speech , 2014, INTERSPEECH.

[42]  Grzegorz Kondrak,et al.  Bootstrapping Unsupervised Bilingual Lexicon Induction , 2017, EACL.

[43]  Yoav Goldberg,et al.  Language Modeling for Code-Switching: Evaluation, Integration of Monolingual Data, and Discriminative Training , 2018, EMNLP.

[44]  Christopher D. Manning,et al.  Bilingual Word Embeddings for Phrase-Based Machine Translation , 2013, EMNLP.

[45]  Ngoc Thang Vu,et al.  Combination of Recurrent Neural Networks and Factored Language Models for Code-Switching Language Modeling , 2013, ACL.

[46]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[47]  Pieter Muysken,et al.  Government and code-mixing , 1986, Journal of Linguistics.

[48]  Guillaume Lample,et al.  Word Translation Without Parallel Data , 2017, ICLR.

[49]  Anders Søgaard,et al.  A Survey of Cross-lingual Word Embedding Models , 2017, J. Artif. Intell. Res..