Multilingual Distributional Semantic Models: Toward a Computational Model of the Bilingual Mental Lexicon

In this paper, we propose a novel framework of a multilingual distributional semantic model to provide a psychologically plausible computational model of the bilingual mental lexicon. In the proposed framework, a monolingual semantic space for each target language is first generated from the corresponding monolingual corpus. These monolingual semantic spaces are then converted into ones with common dimensions, which are in turn integrated into a single multilingual semantic space. The language of dimensions, which we refer to as a pivot language, determines the type of bilinguals simulated by the model. We also tested the psychological plausibility of the proposed multilingual distributional semantic model by comparing the cosine similarity computed by the model with the cross-language word similarity ratings of L1 Japanese/L2 English sequential bilinguals. The result was that the bilingual semantic space with Japanese as a pivot language, which is predicted to be a model for L1 Japanese/L2 English sequential bilinguals, achieved better performance in simulating the similarity rating data. This suggests the plausibility of the proposed multilingual model.

[1]  Chih-Ping Wei,et al.  A Latent Semantic Indexing-based approach to multilingual document clustering , 2008, Decis. Support Syst..

[2]  Kenneth I. Forster,et al.  Cross-language priming asymmetries in lexical decision and episodic recognition , 2001 .

[3]  J. Kroll,et al.  Category Interference in Translation and Picture Naming: Evidence for Asymmetric Connections Between Bilingual Memory Representations , 1994 .

[4]  Judith F. Kroll,et al.  Language Processing in Bilingual Speakers , 2006 .

[5]  W. Kintsch,et al.  High-dimensional semantic space accounts of priming q , 2006 .

[6]  R. French,et al.  Understanding bilingual memory: models and data , 2004, Trends in Cognitive Sciences.

[7]  J. Bullinaria,et al.  Extracting semantic representations from word co-occurrence statistics: A computational study , 2007, Behavior research methods.

[8]  Susan T. Dumais,et al.  The latent semantic analysis theory of knowledge , 1997 .

[9]  Gabriel Recchia,et al.  More data trumps smarter algorithms: Comparing pointwise mutual information with latent semantic analysis , 2009, Behavior research methods.

[10]  WeiChih-Ping,et al.  A Latent Semantic Indexing-based approach to multilingual document clustering , 2008 .

[11]  Brett W. Bader,et al.  Enhancing Multilingual Latent Semantic Analysis with Term Alignment Information , 2008, COLING.

[12]  E. Bialystok Bilingualism: The good, the bad, and the indifferent* , 2009, Bilingualism: Language and Cognition.

[13]  A.F.J. Dijkstra,et al.  The multilingual lexicon , 2007 .

[14]  Dominic Widdows,et al.  Geometry and Meaning , 2004, Computational Linguistics.

[15]  Patrick Pantel,et al.  From Frequency to Meaning: Vector Space Models of Semantics , 2010, J. Artif. Intell. Res..

[16]  Mark Steyvers,et al.  Topics in semantic representation. , 2007, Psychological review.

[17]  K. Conklin,et al.  Cross-linguistic similarity norms for Japanese–English translation equivalents , 2013, Behavior research methods.

[18]  T. Dijkstra,et al.  Language comprehension in the bilingual brain: fMRI and ERP support for psycholinguistic models , 2010, Brain Research Reviews.

[19]  F. Craik,et al.  Bilingualism: consequences for mind and brain , 2012, Trends in Cognitive Sciences.