Modelling Reading Times in Bilingual Sentence Comprehension

Modelling Reading Times in Bilingual Sentence Comprehension Stefan L. Frank (s.frank@let.ru.nl) Centre for Language Studies, Radboud University Nijmegen, The Netherlands Abstract Relatively little is known about the interaction between a bilin- gual’s two languages beyond the word level. This paper inves- tigates the issue by comparing word reading times (RTs) in both L1 and L2 to quantitative predictions by statistical lan- guage models. Recurrent neural networks are trained on either a Dutch corpus, an English corpus, or the two corpora com- bined (i.e., the bilingual network treats the two languages as one). Next, estimates of word surprisal by the three models are compared to RTs by native Dutch speakers on L1 Dutch and L2 English sentences. The monolingual Dutch model accounts for RTs on Dutch better than the bilingual model. In contrast, the bilingual model outperforms the monolingual English model on English RTs. These findings suggest that sentence compre- hension in L1 is not much affected by L2 knowledge, whereas L2 reading does show interference from L1. Keywords: Bilingualism; sentence comprehension; recurrent neural networks; word surprisal; word reading time Introduction Reading time (RT) effects on interlingual homographs and cognates have revealed that L1 knowledge affects L2 reading (Duyck, Van Assche, Drieghe, & Hartsuiker, 2007) and, vice versa, L2 knowledge affects L1 reading (Van Assche, Duyck, Hartsuiker, & Diependaele, 2009). However, whether these effects are modulated by sentence context (rather than be- ing merely lexical phenomena) is still controversial (Libben & Titone, 2009; Van Assche, Drieghe, Duyck, Welvaert, & Hartsuiker, 2011). RT on a word depends, among other things, on the word’s occurrence probability given the sentence so far. More pre- cisely, a positive correlation has been found between RT and the negative logarithm of word probability, a value know as the word’s surprisal (Fernandez Monsalve et al., 2012; Smith & Levy, 2013). Word surprisal can be estimated by statisti- cal language models that are trained on large text corpora. So far, such work has only made use of models that process a single language (predominantly English) but if a bilingual’s two languages influence each other during reading, bilingual (as opposed to monolingual) language models may provide a more accurate account of bilingual reading behaviour. Modelling bilingual sentence processing When recurrent neural networks (RNNs) are applied as sta- tistical language models, their surprisal estimates regularly outperform those from other model types in predicting RTs (Frank & Bod, 2011; Frank & Thompson, 2012) as well as N400 size (Frank, Otten, Galli, & Vigliocco, 2013). More- over, RNNs provide a straightforward account of how two languages may be combined into a single system, as the net- work’s hidden layer can be activated by word input from ei- ther language without receiving any (explicit) information about language identity. French (1998) presents an early example of such a bilingual RNN, trained on two artificial miniature languages that were modelled on French and En- glish. Since the current objective is to accurately estimate sur- prisal values for words from experimental stimuli or naturally occurring sentences, an RNN implementation is required that allows for training on large corpora of natural text. The highly efficient implementation by Mikolov, Deoras, Povey, Burget, and Cernock´ y (2011) is well suited to this purpose. Three RNNs were trained: one on a Dutch corpus, one on an English corpus, and one on the two corpora com- bined. Hence, there are two monolingual networks (hence- forth, RNN Dutch and RNN English ) and one bilingual network (RNN bi ). Dutch training data came from a part of the Corpus of Web (Sch¨afer & Bildhauer, 2012; 5.8M sentences, 107M word tokens, 314K word types) and English data was taken from the British National Corpus (4.5M sentences, 87M word tokens, 182K word types). The three RNNs are architec- turally identical, except for their number of input and out- put nodes which must match the number of word types in the training corpus. Hence, the only thing that makes a network Dutch, English, or bilingual is the language(s) it is trained on. The RNNs embody two extreme views on bilingual pro- cessing: The monolingual models allow no effect of the other language whatsoever, whereas the bilingual model treats the two languages as one. Most likely, bilingual sentence com- prehension falls somewhere in between these two poles. Fit- ting surprisal to RT should reveal which of the two extreme positions is most like bilingual reading. To the extent that bilinguals are affected by the language not currently being used, surprisal estimates by RNN bi should fit the RT data bet- ter than surprisals from a monolingual RNN. Results and conclusion Surprisal values were obtained on each word of the 56 filler (i.e., non-target) sentences from a study by Frank, Trompe- naars, and Vasishth (2014), who collected self-paced RTs from 46 native Dutch speakers tested in either Dutch (N = 24) or English (N = 22). RNN Dutch processed Dutch sentences, RNN English processed English, and RNN bi processed both, yielding four sets of surprisal estimates. A significant amount of variance in RTs was accounted for by each set of surprisals (all p < .0001 in a linear mixed-effects regression analysis) over and above word length and word log-frequency. The main question of interest is whether the monolingual RNNs’ surprisals fit the data better or worse than surprisal from RNN bi . Hence, we compare the fit to RTs of two re- gression models that differ only in the source of their sur- prisal values: one includes surprisal estimates by a monolin- gual RNN (i.e., RNN Dutch for Dutch; RNN English for English) and the other takes RNN bi ’s surprisals (on either Dutch or

[1]  Stefan Frank,et al.  Early effects of word surprisal on pupil size during reading , 2012, CogSci.

[2]  Gabriella Vigliocco,et al.  Word surprisal predicts N400 amplitude during reading , 2013, ACL.

[3]  S. Frank,et al.  Insensitivity of the Human Sentence-Processing System to Hierarchical Structure , 2011, Psychological science.

[4]  Gabriella Vigliocco,et al.  Lexical surprisal as a general predictor of reading time , 2012, EACL.

[5]  R. Hartsuiker,et al.  Visual word recognition by bilinguals in a sentence context: evidence for nonselective lexical access. , 2007, Journal of experimental psychology. Learning, memory, and cognition.

[6]  E. Wagenmakers A practical solution to the pervasive problems ofp values , 2007, Psychonomic bulletin & review.

[7]  Roland Schäfer,et al.  Building Large Corpora from the Web Using a New Efficient Tool Chain , 2012, LREC.

[8]  R. Hartsuiker,et al.  The influence of semantic constraints on bilingual word recognition during sentence reading , 2011 .

[9]  Robin L Thompson,et al.  Reading time data for evaluating broad-coverage models of English sentence processing , 2013, Behavior research methods.

[10]  D. Titone,et al.  Bilingual lexical access in context: evidence from eye movements during reading. , 2009, Journal of experimental psychology. Learning, memory, and cognition.

[11]  Shravan Vasishth,et al.  Cross-linguistic differences in processing double-embedded relative clauses: Working-memory constraints or language statistics? , 2016, CogSci.

[12]  Nathaniel J. Smith,et al.  The effect of word predictability on reading time is logarithmic , 2013, Cognition.

[13]  Robert M French,et al.  A Simple Recurrent Network Model of Bilingual Memory , 1998 .

[14]  Lukás Burget,et al.  Strategies for training large scale neural network language models , 2011, 2011 IEEE Workshop on Automatic Speech Recognition & Understanding.

[15]  R. Hartsuiker,et al.  Does Bilingualism Change Native-Language Reading? , 2009, Psychological science.