Using Gaze Data to Predict Multiword Expressions

In recent years gaze data has been increasingly used to improve and evaluate NLP models due to the fact that it carries information about the cognitive processing of linguistic phenomena. In this paper we conduct a preliminary study towards the automatic identification of multiword expressions based on gaze features from native and non-native speakers of English. We report comparisons between a part-of-speech (POS) and frequency baseline to: i) a prediction model based solely on gaze data and ii) a combined model of gaze data, POS and frequency. In spite of the challenging nature of the task, best performance was achieved by the latter. Furthermore, we explore how the type of gaze data (from native versus non-native speakers) affects the prediction, showing that data from the two groups is discriminative to an equal degree for the task. Finally, we show that late processing measures are more predictive than early ones, which is in line with previous research on idioms and other formulaic structures.

[1]  T. Alharthi Adding More Fuel to the Fire: A Study of Attrition in Formulaic Sequences by Adult Learners , 2015 .

[2]  Simon P Liversedge,et al.  Preview benefit in English spaced compounds. , 2014, Journal of experimental psychology. Learning, memory, and cognition.

[3]  G. Underwood,et al.  The eyes have it , 2004 .

[4]  Erik F. Tjong Kim Sang,et al.  Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition , 2003, CoNLL.

[5]  K. Rayner The 35th Sir Frederick Bartlett Lecture: Eye movements and attention in reading, scene perception, and visual search , 2009, Quarterly journal of experimental psychology.

[6]  K. Rayner,et al.  Lexical complexity and fixation times in reading: Effects of word frequency, verb complexity, and lexical ambiguity , 1986, Memory & cognition.

[7]  Frank Keller,et al.  Data from eye-tracking corpora as evidence for theories of syntactic processing complexity , 2008, Cognition.

[8]  Timothy Baldwin,et al.  Multiword Expressions: A Pain in the Neck for NLP , 2002, CICLing.

[9]  C. A. Weaver,et al.  Psychology of Reading , 2012 .

[10]  Afsaneh Fazly,et al.  AUTOMATIC ACQUISITION OF LEXICAL KNOWLEDGE ABOUT , 2007 .

[11]  Andrew T. Duchowski,et al.  Eye Tracking Methodology: Theory and Practice , 2003, Springer London.

[12]  Behrang Q. Zadeh,et al.  The PARSEME Shared Task on Automatic Identification of Verbal Multiword Expressions , 2017, MWE@EACL.

[13]  Geoffrey Leech,et al.  100 Million Words of English:The British National Corpus (BNC) , 1992 .

[14]  Wouter Duyck,et al.  Presenting GECO: An eyetracking corpus of monolingual and bilingual sentence reading , 2017, Behavior research methods.

[15]  Kathy Conklin,et al.  Adding more fuel to the fire: An eye-tracking study of idiom processing by native and non-native speakers , 2011 .

[16]  K. Rayner,et al.  Effects of contextual constraint on eye movements in reading: A further examination , 1996, Psychonomic bulletin & review.

[17]  Kathy Conklin,et al.  The Processing of Formulaic Language , 2012, Annual Review of Applied Linguistics.

[18]  Aline Villavicencio Verb-Particle Constructions and Lexical Resources , 2003, ACL 2003.

[19]  Wayne S. Murray,et al.  Frequency and predictability effects in the Dundee Corpus: An eye movement analysis , 2013, Quarterly journal of experimental psychology.

[20]  Gareth Carrol,et al.  Eye-tracking multi-word units: some methodological questions , 2015 .

[21]  Frank Keller,et al.  Cross-lingual Transfer of Correlations between Parts of Speech and Gaze Features , 2016, COLING.

[22]  Pauline Foster Rules and routines: A consideration of their role in the task-based language production of native and non-native speakers , 2013 .

[23]  Noah A. Smith,et al.  Discriminative Lexical Semantic Segmentation with Gaps: Running the MWE Gamut , 2014, TACL.

[24]  Afsaneh Fazly,et al.  Automatically Constructing a Lexicon of Verb Phrase Idiomatic Combinations , 2006, EACL.

[25]  B. Erman,et al.  The idiom principle and the open choice principle , 2000 .

[26]  Afsaneh Fazly,et al.  A distributional account of the semantics of multiword expressions , 2008 .

[27]  Mark A Frye,et al.  The eyes have it , 2017, Nature Physics.

[28]  Anders Søgaard,et al.  Evaluating word embeddings with fMRI and eye-tracking , 2016, RepEval@ACL.

[29]  Anna Siyanova-Chanturia,et al.  Eye-tracking and ERPs in multi-word expression research: A state-of-the-art review of the method and findings , 2013 .

[30]  Sylviane Granger,et al.  Phraseology: An Interdisciplinary Perspective , 2008 .

[31]  Joachim Bingel,et al.  Weakly Supervised Part-of-speech Tagging Using Eye-tracking Data , 2016, ACL.

[32]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[33]  K. Rayner The perceptual span and peripheral cues in reading , 1975, Cognitive Psychology.

[34]  Carlos Ramisch,et al.  Identification of Ambiguous Multiword Expressions Using Sequence Models and Lexical Resources , 2017, MWE@EACL.

[35]  Timothy Baldwin,et al.  Multiword Expressions , 2010, Handbook of Natural Language Processing.

[36]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[37]  Koby Crammer,et al.  Adaptive regularization of weight vectors , 2009, Machine Learning.

[38]  Shiva Taslimipoor,et al.  Cognitive Processing of Multiword Expressions in Native and Non-native Speakers of English: Evidence from Gaze Data , 2017, Europhras.