Character-based Surprisal as a Model of Human Reading in the Presence of Errors

Intuitively, human readers cope easily with errors in text; typos, misspelling, word substitutions, etc. do not unduly disrupt natural reading. Previous work indicates that letter transpositions result in increased reading times, but it is unclear if this effect generalizes to more natural errors. In this paper, we report an eye-tracking study that compares two error types (letter transpositions and naturally occurring misspelling) and two error rates (10% or 50% of all words contain errors). We find that human readers show unimpaired comprehension in spite of these errors, but error words cause more reading difficulty than correct words. Also, transpositions are more difficult than misspellings, and a high error rate increases difficulty for all words, including correct ones. We then present a computational model that uses character-based (rather than traditional word-based) surprisal to account for these results. The model explains that transpositions are harder than misspellings because they contain unexpected letter combinations. It also explains the error rate effect: upcoming words are more difficultto predict when the context is degraded, leading to increased surprisal.

[1]  Elizabeth R Schotter,et al.  Task effects reveal cognitive flexibility responding to frequency and predictability: Evidence from eye movements in reading and proofreading , 2014, Cognition.

[2]  Yonatan Belinkov,et al.  Synthetic and Natural Noise Both Break Neural Machine Translation , 2017, ICLR.

[3]  D. Bates,et al.  Fitting Linear Mixed-Effects Models Using lme4 , 2014, 1406.5823.

[4]  Frank Keller,et al.  Data from eye-tracking corpora as evidence for theories of syntactic processing complexity , 2008, Cognition.

[5]  Stefan L. Frank,et al.  Surprisal-based comparison between a symbolic and a connectionist model of sentence processing , 2009 .

[6]  Jeroen Geertzen,et al.  Automatic Linguistic Annotation ofLarge Scale L2 Databases: The EF-Cambridge Open Language Database(EFCamDat) , 2014 .

[7]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[8]  Phil Blunsom,et al.  Teaching Machines to Read and Comprehend , 2015, NIPS.

[9]  Manuel Perea,et al.  Transposed-letter effects in reading: evidence from eye movements and parafoveal preview. , 2007, Journal of experimental psychology. Human perception and performance.

[10]  Sarah J. White,et al.  Raeding Wrods With Jubmled Lettres , 2006, Psychological science.

[11]  Keith Rayner,et al.  Eye movements when reading transposed text: the importance of word-beginning letters. , 2008, Journal of experimental psychology. Human perception and performance.

[12]  R. Levy Expectation-based syntactic comprehension , 2008, Cognition.

[13]  Johanna K. Kaakinen,et al.  Task effects on eye movements during reading. , 2010, Journal of experimental psychology. Learning, memory, and cognition.

[14]  Paul-Christian Bürkner,et al.  brms: An R Package for Bayesian Multilevel Models Using Stan , 2017 .

[15]  Richard Socher,et al.  An Analysis of Neural Language Modeling at Multiple Scales , 2018, ArXiv.

[16]  Frank Keller,et al.  Modeling Task Effects in Human Reading with Neural Attention , 2018, ArXiv.