ZuCo, a simultaneous EEG and eye-tracking resource for natural sentence reading

We present the Zurich Cognitive Language Processing Corpus (ZuCo), a dataset combining electroencephalography (EEG) and eye-tracking recordings from subjects reading natural sentences. ZuCo includes high-density EEG and eye-tracking data of 12 healthy adult native English speakers, each reading natural English text for 4–6 hours. The recordings span two normal reading tasks and one task-specific reading task, resulting in a dataset that encompasses EEG and eye-tracking data of 21,629 words in 1107 sentences and 154,173 fixations. We believe that this dataset represents a valuable resource for natural language processing (NLP). The EEG and eye-tracking signals lend themselves to train improved machine-learning models for various tasks, in particular for information extraction tasks such as entity and relation extraction and sentiment analysis. Moreover, this dataset is useful for advancing research into the human reading and language understanding process at the level of brain activity and eye-movement. Design Type(s) time series design • process-based data analysis objective • natural language processing objective Measurement Type(s) brain activity measurement • eye movement Technology Type(s) electroencephalography • eye tracking device Factor Type(s) age • biological sex Sample Characteristic(s) Homo sapiens • brain • eye Design Type(s) time series design • process-based data analysis objective • natural language processing objective Measurement Type(s) brain activity measurement • eye movement Technology Type(s) electroencephalography • eye tracking device Factor Type(s) age • biological sex Sample Characteristic(s) Homo sapiens • brain • eye Machine-accessible metadata file describing the reported data (ISA-Tab format)

[1]  Joachim Bingel,et al.  Sequence Classification with Human Attention , 2018, CoNLL.

[2]  D H Brainard,et al.  The Psychophysics Toolbox. , 1997, Spatial vision.

[3]  Andrew McCallum,et al.  Integrating Probabilistic Extraction Models and Data Mining to Discover Relations and Patterns in Text , 2006, NAACL.

[4]  Christopher Potts,et al.  Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank , 2013, EMNLP.

[5]  A. Jacobs,et al.  Coregistration of eye movements and EEG in natural reading: analyses and review. , 2011, Journal of experimental psychology. General.

[6]  Steven G. Luke,et al.  Co-registration of eye movements and event-related potentials in connected-text paragraph reading , 2013, Front. Syst. Neurosci..

[7]  Nora Hollenstein,et al.  Zurich Cognitive Language Processing Corpus: A simultaneous EEG and eye-tracking resource for analyzing the human reading process , 2018 .

[8]  K. Rayner Eye movements in reading and information processing: 20 years of research. , 1998, Psychological bulletin.

[9]  D G Pelli,et al.  The VideoToolbox software for visual psychophysics: transforming numbers into movies. , 1997, Spatial vision.

[10]  A. Bruns Fourier-, Hilbert- and wavelet-based signal analysis: are they really different approaches? , 2004, Journal of Neuroscience Methods.

[11]  Kristin Lemhöfer,et al.  Introducing LexTALE: A quick and valid Lexical Test for Advanced Learners of English , 2011, Behavior research methods.

[12]  Matlab Matlab (the language of technical computing): using matlab graphics ver.5 , 2014 .

[13]  Stephanie Brandl,et al.  Robust artifactual independent component classification for BCI practitioners , 2014, Journal of neural engineering.

[14]  K. Rayner,et al.  Measuring word recognition in reading: eye movements and event-related potentials , 2003, Trends in Cognitive Sciences.

[15]  Joachim Bingel,et al.  Weakly Supervised Part-of-speech Tagging Using Eye-tracking Data , 2016, ACL.

[16]  Jonathan Rotsztejn Learning from Cognitive Features to Support Natural Language Processing Tasks , 2018 .

[17]  Fabio Richlan,et al.  Oscillatory Brain Dynamics during Sentence Reading: A Fixation-Related Spectral Perturbation Analysis , 2016, Front. Hum. Neurosci..

[18]  Anders Søgaard,et al.  Evaluating word embeddings with fMRI and eye-tracking , 2016, RepEval@ACL.

[19]  Pushpak Bhattacharyya,et al.  Leveraging Cognitive Features for Sentiment Analysis , 2016, CoNLL.

[20]  Lucas C. Parra,et al.  Recipes for the linear analysis of EEG , 2005, NeuroImage.

[21]  Wouter Duyck,et al.  Presenting GECO: An eyetracking corpus of monolingual and bilingual sentence reading , 2017, Behavior research methods.

[22]  M. Tangermann,et al.  Automatic Classification of Artifactual ICA-Components for Artifact Removal in EEG Signals , 2011, Behavioral and Brain Functions.