Enhanced English Universal Dependencies: An Improved Representation for Natural Language Understanding Tasks

Many shallow natural language understanding tasks use dependency trees to extract relations between content words. However, strict surface-structure dependency trees tend to follow the linguistic structure of sentences too closely and frequently fail to provide direct relations between content words. To mitigate this problem, the original Stanford Dependencies representation also defines two dependency graph representations which contain additional and augmented relations that explicitly capture otherwise implicit relations between content words. In this paper, we revisit and extend these dependency graph representations in light of the recent Universal Dependencies (UD) initiative and provide a detailed account of an enhanced and an enhanced++ English UD representation. We further present a converter from constituency to basic, i.e., strict surface structure, UD trees, and a converter from basic UD trees to enhanced and enhanced++ English UD graphs. We release both converters as part of Stanford CoreNLP and the Stanford Parser.

[1]  Huber Gray Buehler,et al.  A modern English grammar , 1900 .

[2]  Daniel Gildea,et al.  The Proposition Bank: An Annotated Corpus of Semantic Roles , 2005, CL.

[3]  Reut Tsarfaty,et al.  A Unified Morpho-Syntactic Scheme of Stanford Dependencies , 2013, ACL.

[4]  Joakim Nivre,et al.  Universal Stanford dependencies: A cross-linguistic typology , 2014, LREC.

[5]  Christopher D. Manning,et al.  Leveraging Linguistic Structure For Open Domain Information Extraction , 2015, ACL.

[6]  Christopher D. Manning,et al.  Generating Typed Dependency Parses from Phrase Structure Parses , 2006, LREC.

[7]  J. Barwise,et al.  Generalized quantifiers and natural language , 1981 .

[8]  Danqi Chen,et al.  A Fast and Accurate Dependency Parser using Neural Networks , 2014, EMNLP.

[9]  Philipp Koehn,et al.  Abstract Meaning Representation for Sembanking , 2013, LAW@ACL.

[10]  Sampo Pyysalo,et al.  Universal Dependencies v1: A Multilingual Treebank Collection , 2016, LREC.

[11]  Dmitriy Genzel,et al.  Automatically Learning Source-side Reordering Rules for Large Scale Machine Translation , 2010, COLING.

[12]  Francesca Masini,et al.  On Light Nouns , 2014 .

[13]  Oren Etzioni,et al.  Open Language Learning for Information Extraction , 2012, EMNLP.

[14]  Francesca Masini,et al.  Word classes : nature, typology and representations , 2014 .

[15]  Roger Levy,et al.  Tregex and Tsurgeon: tools for querying and manipulating tree data structures , 2006, LREC.

[16]  Li Fei-Fei,et al.  Generating Semantically Precise Scene Graphs from Textual Descriptions for Improved Image Retrieval , 2015, VL@EMNLP.

[17]  Tapio Salakoski,et al.  Predicting Conjunct Propagation and Other Extended Stanford Dependencies , 2013, DepLing.

[18]  Eric Yeh,et al.  Learning Alignments and Leveraging Natural Logic , 2007, ACL-PASCAL@ACL.

[19]  Joakim Nivre,et al.  Universal Dependency Annotation for Multilingual Parsing , 2013, ACL.

[20]  Ion Androutsopoulos,et al.  An extractive supervised two-stage method for sentence compression , 2010, NAACL.

[21]  Stuart M. Shieber The design of a computer language for linguistic information , 1984 .

[22]  Edward L. Keenan,et al.  A semantic characterization of natural language determiners , 1986 .

[23]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[24]  S. M Sheiber The design of a computer language for linguistic information coling-84 362--366 , 1984 .

[25]  Hoifung Poon,et al.  Unsupervised Semantic Parsing , 2009, EMNLP.