Dependency parsing with structure preserving embeddings

Modern neural approaches to dependency parsing are trained to predict a tree structure by jointly learning a contextual representation for tokens in a sentence, as well as a head–dependent scoring function. Whereas this strategy results in high performance, it is difficult to interpret these representations in relation to the geometry of the underlying tree structure. Our work seeks instead to learn interpretable representations by training a parser to explicitly preserve structural properties of a tree. We do so by casting dependency parsing as a tree embedding problem where we incorporate geometric properties of dependency trees in the form of training losses within a graph-based parser. We provide a thorough evaluation of these geometric losses, showing that a majority of them yield strong tree distance preservation as well as parsing performance on par with a competitive graph-based parser (Qi et al., 2018). Finally, we show where parsing errors lie in terms of tree relationship in order to guide future work.

[1]  Timothy Dozat,et al.  Stanford’s Graph-based Neural Dependency Parser at the CoNLL 2017 Shared Task , 2017, CoNLL.

[2]  Jürgen Schmidhuber,et al.  Highway Networks , 2015, ArXiv.

[3]  Makoto Miwa,et al.  End-to-End Relation Extraction using LSTMs on Sequences and Tree Structures , 2016, ACL.

[4]  Khalil Sima'an,et al.  Graph Convolutional Encoders for Syntax-aware Neural Machine Translation , 2017, EMNLP.

[5]  Martin Wattenberg,et al.  Visualizing and Measuring the Geometry of BERT , 2019, NeurIPS.

[6]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[7]  Diego Marcheggiani,et al.  Encoding Sentences with Graph Convolutional Networks for Semantic Role Labeling , 2017, EMNLP.

[8]  Tomas Mikolov,et al.  Enriching Word Vectors with Subword Information , 2016, TACL.

[9]  Joakim Nivre,et al.  Deep Contextualized Word Embeddings in Transition-Based and Graph-Based Dependency Parsing - A Tale of Two Parsers Revisited , 2019, EMNLP.

[10]  Razvan C. Bunescu,et al.  A Shortest Path Dependency Kernel for Relation Extraction , 2005, HLT.

[11]  Lucien Tesnière Éléments de syntaxe structurale , 1959 .

[12]  Kadri Hacioglu,et al.  Semantic Role Labeling Using Dependency Trees , 2004, COLING.

[13]  Timothy Dozat,et al.  Universal Dependency Parsing from Scratch , 2019, CoNLL.

[14]  Christopher D. Manning,et al.  Graph Convolution over Pruned Dependency Trees Improves Relation Extraction , 2018, EMNLP.

[15]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[16]  Nathan Linial,et al.  The geometry of graphs and some of its algorithmic applications , 1994, Proceedings 35th Annual Symposium on Foundations of Computer Science.

[17]  Nanda Kambhatla,et al.  Combining Lexical, Syntactic, and Semantic Features with Maximum Entropy Models for Information Extraction , 2004, ACL.

[18]  Zhiyi Chi,et al.  Statistical Properties of Probabilistic Context-Free Grammars , 1999, CL.

[19]  Hai Zhao,et al.  Syntax for Semantic Role Labeling, To Be, Or Not To Be , 2018, ACL.

[20]  Martha Palmer,et al.  Synchronous Dependency Insertion Grammars: A Grammar Formalism for Syntax Based Statistical MT , 2004 .

[21]  菅山 謙正,et al.  Word Grammar 理論の研究 , 2005 .

[22]  Sampo Pyysalo,et al.  Universal Dependencies v2: An Evergrowing Multilingual Treebank Collection , 2020, LREC.

[23]  Ryan Cotterell,et al.  A Tale of a Probe and a Parser , 2020, ACL.

[24]  Fernando Pereira,et al.  Non-Projective Dependency Parsing using Spanning Tree Algorithms , 2005, HLT.

[25]  James Henderson,et al.  Recursive Non-Autoregressive Graph-to-Graph Transformer for Dependency Parsing with Iterative Refinement , 2020, TACL.

[26]  Mirella Lapata,et al.  Dependency Parsing as Head Selection , 2016, EACL.

[27]  Kihyuk Sohn,et al.  Improved Deep Metric Learning with Multi-class N-pair Loss Objective , 2016, NIPS.

[28]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[29]  Christopher D. Manning,et al.  A Structural Probe for Finding Syntax in Word Representations , 2019, NAACL.