The Dundee Treebank

We introduce the Dundee Treebank, a Universal Dependencies-style syntactic annotation layer on top of the English side of the Dundee Corpus. As the Dundee Corpus is an important resource for conducting large-scale psycholinguistic research, we aim at facilitating further research in the field by replacing automatic parses with manually assigned syntax. We report on constructing the treebank, performing parsing experiments, as well as replicating a broad-scale psycholinguistic study—now for the first time using manually assigned syntactic dependencies.

[1]  Anders Søgaard,et al.  Reading behavior predicts syntactic categories , 2015, CoNLL.

[2]  Kenneth Heafield,et al.  KenLM: Faster and Smaller Language Model Queries , 2011, WMT@EMNLP.

[3]  S. Frank,et al.  Insensitivity of the Human Sentence-Processing System to Hierarchical Structure , 2011, Psychological science.

[4]  K. Rayner Eye movements in reading and information processing: 20 years of research. , 1998, Psychological bulletin.

[5]  Anders Søgaard,et al.  Using reading behavior to predict grammatical functions , 2015 .

[6]  Joakim Nivre,et al.  Universal Stanford dependencies: A cross-linguistic typology , 2014, LREC.

[7]  Stefan L. Frank,et al.  Surprisal-based comparison between a symbolic and a connectionist model of sentence processing , 2009 .

[8]  D. Bates,et al.  Fitting Linear Mixed-Effects Models Using lme4 , 2014, 1406.5823.

[9]  Frank Keller,et al.  Data from eye-tracking corpora as evidence for theories of syntactic processing complexity , 2008, Cognition.

[10]  Peter Dalgaard,et al.  R Development Core Team (2010): R: A language and environment for statistical computing , 2010 .

[11]  A. Kennedy,et al.  Parafoveal-on-foveal effects in normal reading , 2005, Vision Research.

[12]  Nathaniel J. Smith,et al.  Fixation durations in first-pass reading reflect uncertainty about word identity , 2010 .

[13]  E. Gibson The dependency locality theory: A distance-based theory of linguistic complexity. , 2000 .

[14]  R. Shillcock,et al.  Low-level predictive inference in reading: the influence of transitional probabilities on eye movements , 2003, Vision Research.

[15]  Bernd Bohnet,et al.  Top Accuracy and Fast Dependency Parsing is not a Contradiction , 2010, COLING.

[16]  Frank Keller,et al.  Syntactic and Semantic Factors in Processing Difficulty: An Integrated Measure , 2010, ACL.