PIHKers at CMCL 2021 Shared Task: Cosine Similarity and Surprisal to Predict Human Reading Patterns.

Eye-tracking psycholinguistic studies have revealed that context-word semantic coherence and predictability influence language processing. In this paper we show our approach to predict eye-tracking features from the ZuCo dataset for the shared task of the Cognitive Modeling and Computational Linguistics (CMCL2021) workshop. Using both cosine similarity and surprisal within a regression model, we significantly improved the baseline Mean Absolute Error computed among five eye-tracking features.

[1]  Ce Zhang,et al.  CogniVal: A Framework for Cognitive Word Embedding Evaluation , 2019, CoNLL.

[2]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[3]  Nora Hollenstein,et al.  ZuCo, a simultaneous EEG and eye-tracking resource for natural sentence reading , 2018, Scientific Data.

[4]  Stefan Frank,et al.  Word Embedding Distance Does not Predict Word Reading Time , 2017, CogSci.

[5]  Tal Linzen,et al.  A Neural Model of Adaptation in Reading , 2018, EMNLP.

[6]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[7]  K. Rayner,et al.  Contextual effects on word perception and eye movements during reading , 1981 .

[8]  Alessandro Lenci,et al.  Distributional Models of Word Meaning , 2018 .

[9]  R. Levy Expectation-based syntactic comprehension , 2008, Cognition.

[10]  Tomas Mikolov,et al.  Enriching Word Vectors with Subword Information , 2016, TACL.

[11]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[12]  Yohei Oseki,et al.  CMCL 2021 Shared Task on Eye-Tracking Prediction , 2021, CMCL.

[13]  Wouter Duyck,et al.  Presenting GECO: An eyetracking corpus of monolingual and bilingual sentence reading , 2017, Behavior research methods.

[14]  John Hale,et al.  A Probabilistic Earley Parser as a Psycholinguistic Model , 2001, NAACL.

[15]  Ilya Sutskever,et al.  Language Models are Unsupervised Multitask Learners , 2019 .

[16]  Ralf Engbert,et al.  Length, frequency, and predictability effects of words on eye movements in reading , 2004 .

[17]  Frank Keller,et al.  Syntactic and Semantic Factors in Processing Difficulty: An Integrated Measure , 2010, ACL.

[18]  J. Bullinaria,et al.  Extracting semantic representations from word co-occurrence statistics: A computational study , 2007, Behavior research methods.

[19]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[20]  Steven G. Luke,et al.  The Provo Corpus: A large eye-tracking corpus with predictability norms , 2018, Behavior research methods.

[21]  John A Bullinaria,et al.  Extracting semantic representations from word co-occurrence statistics: stop-lists, stemming, and SVD , 2012, Behavior Research Methods.