Cross-Lingual Word Embeddings and the Structure of the Human Bilingual Lexicon

Research on the bilingual lexicon has uncovered fascinating interactions between the lexicons of the native language and of the second language in bilingual speakers. In particular, it has been found that the lexicon of the underlying native language affects the organisation of the second language. In the spirit of interpreting current distributed representations, this paper investigates two models of cross-lingual word embeddings, comparing them to the shared-translation effect and the cross-lingual coactivation effects of false and true friends (cognates) found in humans. We find that the similarity structure of the cross-lingual word embeddings space yields the same effects as the human bilingual lexicon.

[1]  Georgiana Dinu,et al.  Improving zero-shot learning by mitigating the hubness problem , 2014, ICLR.

[2]  Eneko Agirre,et al.  A robust self-learning method for fully unsupervised cross-lingual mappings of word embeddings , 2018, ACL.

[3]  Manaal Faruqui,et al.  Improving Vector Space Word Representations Using Multilingual Correlation , 2014, EACL.

[4]  Natasha Tokowicz,et al.  Ambiguous words are harder to learn , 2010 .

[5]  Ton Dijkstra,et al.  The Bilingual Lexicon , 2010 .

[6]  Christopher D. Manning,et al.  Bilingual Word Representations with Monolingual Quality in Mind , 2015, VS@HLT-NAACL.

[7]  Neda Akbari Comparing the L1 and L2 Mental Lexicon Development, Breadth, Depth and Accessibility , 2011 .

[8]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[9]  J. Kroll,et al.  Cognate effects in picture naming: Does cross-language activation survive a change of script? , 2008, Cognition.

[10]  Barbara Plank,et al.  Inverted indexing for cross-lingual NLP , 2015, ACL.

[11]  James Henderson,et al.  Weakly-Supervised Concept-based Adversarial Learning for Cross-lingual Word Embeddings , 2019, EMNLP.

[12]  Nan Jiang FORM–MEANING MAPPING IN VOCABULARY ACQUISITION IN A SECOND LANGUAGE , 2002, Studies in Second Language Acquisition.

[13]  A. Weber,et al.  Lexical competition in non-native spoken-word recognition , 2004 .

[14]  John N. Williams,et al.  The Bilingual Lexicon , 2015 .

[15]  Kevin Gimpel,et al.  Tailoring Continuous Word Representations for Dependency Parsing , 2014, ACL.

[16]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[17]  Guillaume Lample,et al.  Massively Multilingual Word Embeddings , 2016, ArXiv.

[18]  Samuel L. Smith,et al.  Offline bilingual word vectors, orthogonal transformations and the inverted softmax , 2017, ICLR.

[19]  Tomas Mikolov,et al.  Enriching Word Vectors with Subword Information , 2016, TACL.

[20]  R. Baayen,et al.  How cross-language similarity and task demands affect cognate recognition , 2010 .

[21]  A. Caramazza,et al.  The cognate facilitation effect: implications for models of lexical access. , 2000, Journal of experimental psychology. Learning, memory, and cognition.

[22]  Ping Li,et al.  Computational modeling of bilingualism: How can models tell us more about the bilingual mind?* , 2013, Bilingualism: Language and Cognition.

[23]  Christopher D. Manning,et al.  Bilingual Word Embeddings for Phrase-Based Machine Translation , 2013, EMNLP.

[24]  Marie-Francine Moens,et al.  Bilingual Distributed Word Representations from Document-Aligned Comparable Data , 2015, J. Artif. Intell. Res..

[25]  Zhiguo Wang,et al.  Coverage Embedding Models for Neural Machine Translation , 2016, EMNLP.

[26]  J. Grainger,et al.  Orthographic neighborhood effects in bilingual word recognition , 1998 .

[27]  Yevgen Matusevych,et al.  Learning constructions from bilingual exposure : Computational studies of argument structure acquisition , 2016 .

[28]  Ton Dijkstra,et al.  Naming interlingual homographs: Variable competition and the role of the decision system , 2006 .

[29]  W. Marslen-Wilson,et al.  Making Sense of Semantic Ambiguity: Semantic Competition in Lexical Access , 2002 .

[30]  Quoc V. Le,et al.  Exploiting Similarities among Languages for Machine Translation , 2013, ArXiv.

[31]  A. D. Groot,et al.  Conceptual representation in bilingual memory: Effects of concreteness and cognate status in word association , 1998, Bilingualism: Language and Cognition.

[32]  T. Dijkstra,et al.  Interlingual homograph recognition: Effects of task demands and language intermixing , 1998, Bilingualism: Language and Cognition.

[33]  Emmanuel Dupoux,et al.  Assessing the Ability of LSTMs to Learn Syntax-Sensitive Dependencies , 2016, TACL.

[34]  Suzanne Stevenson,et al.  Modeling bilingual word associations as connected monolingual networks , 2018, CMCL.

[35]  J. Grainger,et al.  Recognition of Cognates and Interlingual Homographs: The Neglected Role of Phonology , 1999 .

[36]  Eneko Agirre,et al.  Learning bilingual word embeddings with (almost) no bilingual data , 2017, ACL.

[37]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[38]  Phil Blunsom,et al.  Multilingual Models for Compositional Distributed Semantics , 2014, ACL.

[39]  N. F. Johnson,et al.  A Cohort Model of Visual Word Recognition , 1994, Cognitive Psychology.

[40]  Anat Prior,et al.  Bidirectional transfer: The effect of sharing a translation , 2011 .