Comparing and Predicting Eye-tracking Data of Mandarin and Cantonese

Eye-tracking data in Chinese languages present unique challenges due to the non-alphabetic and unspaced nature of the Chinese writing systems. This paper introduces the first deeply-annotated joint Mandarin-Cantonese eye-tracking dataset, from which we achieve a unified eye-tracking prediction system for both language varieties. In addition to the commonly studied first fixation duration and the total fixation duration, this dataset also includes the second fixation duration, expressing fixation patterns that are more relevant to higher-level, structural processing. A basic comparison of the features and measurements in our dataset revealed variation between Mandarin and Cantonese on fixation patterns related to word class and word position. The test of feature usefulness suggested that traditional features are less powerful in predicting the second-pass fixation, to which the linear distance to root makes a leading contribution in Mandarin. In contrast, Cantonese eye-movement behavior relies more on word position and part of speech.

[1]  Alessandro Lenci,et al.  A study on surprisal and semantic relatedness for eye-tracking data prediction , 2023, Frontiers in Psychology.

[2]  Barbara Plank The “Problem” of Human Label Variation: On Ground Truth in Data, Modeling and Evaluation , 2022, EMNLP.

[3]  Yohei Oseki,et al.  CMCL 2021 Shared Task on Eye-Tracking Prediction , 2021, CMCL.

[4]  Ce Zhang,et al.  Multilingual Language Models Predict Human Reading Behavior , 2021, NAACL.

[5]  Elisa Gironzetti,et al.  Eye-tracking applications for Spanish pragmatics research , 2020 .

[6]  Alexander Pollatsek,et al.  An integrated model of word processing and eye-movement control during Chinese reading. , 2020, Psychological review.

[7]  G. Wigglesworth,et al.  The Effect of Word Predictability on Phonological Activation in Cantonese Reading: A Study of Eye-Fixations and Pupillary Response , 2020, Journal of psycholinguistic research.

[8]  Grégoire Winterstein,et al.  Cifu: a Frequency Lexicon of Hong Kong Cantonese , 2020, LREC.

[9]  S. Liversedge,et al.  Word Skipping in Chinese Reading: The Role of High-Frequency Preview and Syntactic Felicity , 2019, Journal of experimental psychology. Learning, memory, and cognition.

[10]  Anna Veronika Dorogush,et al.  CatBoost: gradient boosting with categorical features support , 2018, ArXiv.

[11]  Haldun Akoglu,et al.  User's guide to correlation coefficients , 2018, Turkish journal of emergency medicine.

[12]  S. Liversedge,et al.  Investigating Word Length Effects in Chinese Reading , 2018, Journal of experimental psychology. Human perception and performance.

[13]  Kathy Conklin,et al.  Using eye-tracking in applied linguistics and second language research , 2016 .

[14]  Alexander Pollatsek,et al.  The Role of Words in Chinese Reading , 2015 .

[15]  Jie-Li Tsai,et al.  The Influence of Syntactic Category and Semantic Constraints on Lexical Ambiguity Resolution: An Eye Movement Study of Processing Chinese Homographs , 2015 .

[16]  Simon P. Liversedge,et al.  The effect of visual complexity and word frequency on eye movements during Chinese reading , 2014 .

[17]  Xingshan Li,et al.  Eye movement guidance in Chinese reading: Is there a preferred viewing location? , 2011, Vision Research.

[18]  Alex D. Hwang,et al.  Semantic guidance of eye movements in real-world scenes , 2011, Vision Research.

[19]  Frank Keller,et al.  Syntactic and Semantic Factors in Processing Difficulty: An Integrated Measure , 2010, ACL.

[20]  Daniel Jurafsky,et al.  Discriminative Reordering with Chinese Grammatical Relations Features , 2009, SSST@HLT-NAACL.

[21]  K. Rayner,et al.  Reading spaced and unspaced Chinese text: evidence from eye movements. , 2008, Journal of experimental psychology. Human perception and performance.

[22]  Alexander Pollatsek,et al.  Immediate and delayed effects of word frequency and word length on eye movements in reading: a reversed delayed effect of word length. , 2008, Journal of experimental psychology. Human perception and performance.

[23]  Alexander Pollatsek,et al.  Extending the E-Z Reader Model of Eye Movement Control to Chinese Readers , 2007, Cogn. Sci..

[24]  Xuejun Bai,et al.  The effect of word and character frequency on the eye movements of Chinese readers. , 2006, British journal of psychology.

[25]  Barbara J. Juhasz,et al.  The effect of word predictability on the eye movements of Chinese readers , 2005, Psychonomic bulletin & review.

[26]  Cheryl Frenck-Mestre,et al.  Eye-movement recording as a tool for studying syntactic processing in a second language: a review of methodologies and experimental findings , 2005 .

[27]  M. Dryer Word Order in Sino-Tibetan Languages from a Typological And Geographical Perspective , 2002 .

[28]  Shlomo Bentin,et al.  Syntactic and Semantic Factors in Processing Gender Agreement in Hebrew: Evidence from ERPs and Eye Movements ☆ , 2001 .

[29]  S H Hsu,et al.  Effects of Word Spacing on Reading Chinese Text from a Video Display Terminal , 2000, Perceptual and motor skills.

[30]  Susan M. Garnsey,et al.  Agreement Processes in Sentence Comprehension , 1999 .

[31]  Avital Deutsch,et al.  Subject-Predicate Agreement in Hebrew: Interrelations with Semantic Processes , 1998 .

[32]  M. Dryer The Greenbergian word order correlations , 1992 .

[33]  Robin K. Morris,et al.  Eye movements and on-line language comprehension processes , 1989 .

[34]  H E Blanchard,et al.  CENTER FOR THE STUDY OF READING Technical Report No 285 A COMPARISON OF SOME PROCESSING TIME MEASURES BASED ON EYE ' MOVEMENTS , 2007 .

[35]  M A Just,et al.  A theory of reading: from eye fixations to comprehension. , 1980, Psychological review.

[36]  Cassandra L. Jacobs,et al.  CMCL 2022 Shared Task on Multilingual and Crosslingual Prediction of Human Reading Behavior , 2022, CMCL.

[37]  Jackson L. Lee,et al.  PyCantonese: Cantonese Linguistics and NLP in Python , 2022, LREC.

[38]  Yu-Yin Hsu,et al.  HkAmsters at CMCL 2022 Shared Task: Predicting Eye-Tracking Data from a Gradient Boosting Framework with Linguistic Features , 2022, CMCL.

[39]  Stefan Frank,et al.  Word Embedding Distance Does not Predict Word Reading Time , 2017, CogSci.

[40]  David C. S. Li,et al.  Facilitation of transference: The case of monosyllabic salience in Hong Kong Cantonese , 2016 .

[41]  Falk Huettig,et al.  When meaning matters: The temporal dynamics of semantic influences on visual attention. , 2016, Journal of experimental psychology. Human perception and performance.

[42]  William Schuler,et al.  Hierarchic syntax improves reading time prediction , 2015, NAACL.

[43]  K. Rayner,et al.  Eye movements in reading words and sentences , 2007 .