Graph-to-Graph Transformer for Transition-based Dependency Parsing

We propose the Graph2Graph Transformer architecture for conditioning on and predicting arbitrary graphs, and apply it to the challenging task of transition-based dependency parsing. After proposing two novel Transformer models of transition-based dependency parsing as strong baselines, we show that adding the proposed mechanisms for conditioning on and predicting graphs of Graph2Graph Transformer results in significant improvements, both with and without BERT pre-training. The novel baselines and their integration with Graph2Graph Transformer significantly outperform the state-of-the-art in traditional transition-based dependency parsing on both English Penn Treebank, and 13 languages of Universal Dependencies Treebanks. Graph2Graph Transformer can be integrated with many previous structured prediction methods, making it easy to apply to a wide range of NLP tasks.

[1]  Joakim Nivre,et al.  An Efficient Algorithm for Projective Dependency Parsing , 2003, IWPT.

[2]  Yue Zhang,et al.  Feature Embedding for Dependency Parsing , 2014, COLING.

[3]  Joakim Nivre,et al.  Characterizing the Errors of Data-Driven Dependency Parsing Models , 2007, EMNLP.

[4]  Yuanbin Wu,et al.  Graph-based Dependency Parsing with Graph Neural Networks , 2019, ACL.

[5]  Christopher Potts,et al.  Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank , 2013, EMNLP.

[6]  Hai Zhao,et al.  Seq2seq Dependency Parsing , 2018, COLING.

[7]  Joakim Nivre,et al.  Algorithms for Deterministic Incremental Dependency Parsing , 2008, CL.

[8]  Joakim Nivre,et al.  MaltOptimizer: Fast and effective parser optimization , 2014, Natural Language Engineering.

[9]  Carlos Gómez-Rodríguez,et al.  Left-to-Right Dependency Parsing with Pointer Networks , 2019, NAACL.

[10]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[11]  Yuji Matsumoto,et al.  Statistical Dependency Analysis with Support Vector Machines , 2003, IWPT.

[12]  Joakim Nivre,et al.  Non-Projective Dependency Parsing in Expected Linear Time , 2009, ACL.

[13]  Phil Blunsom,et al.  The Role of Syntax in Vector Space Models of Compositional Semantics , 2013, ACL.

[14]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[15]  Alessandro Moschitti,et al.  Convolution Kernels on Constituent, Dependency and Sequential Structures for Relation Extraction , 2009, EMNLP.

[16]  Miguel Ballesteros,et al.  Automatic Feature Selection for Agenda-Based Dependency Parsing , 2014, COLING.

[17]  Enhong Chen,et al.  Stack-based Multi-layer Attention for Transition-based Dependency Parsing , 2017, EMNLP.

[18]  James Henderson,et al.  Inducing History Representations for Broad Coverage Statistical Parsing , 2003, NAACL.

[19]  Slav Petrov,et al.  Globally Normalized Transition-Based Neural Networks , 2016, ACL.

[20]  Christopher D. Manning,et al.  Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks , 2015, ACL.

[21]  Kenneth Heafield,et al.  Incorporating Source Syntax into Transformer-Based Neural Machine Translation , 2019, WMT.

[22]  Yoav Goldberg,et al.  Assessing BERT's Syntactic Abilities , 2019, ArXiv.

[23]  Christopher D. Manning,et al.  Generating Typed Dependency Parses from Phrase Structure Parses , 2006, LREC.

[24]  Joakim Nivre,et al.  Deep Contextualized Word Embeddings in Transition-Based and Graph-Based Dependency Parsing - A Tale of Two Parsers Revisited , 2019, EMNLP.

[25]  Joakim Nivre,et al.  MaltEval: an Evaluation and Visualization Tool for Dependency Parsing , 2008, LREC.

[26]  Timothy Dozat,et al.  Stanford’s Graph-based Neural Dependency Parser at the CoNLL 2017 Shared Task , 2017, CoNLL.

[27]  Ashish Vaswani,et al.  Self-Attention with Relative Position Representations , 2018, NAACL.

[28]  James Henderson,et al.  Recursive Non-Autoregressive Graph-to-Graph Transformer for Dependency Parsing with Iterative Refinement , 2020, TACL.

[29]  Ivan Titov,et al.  Multilingual Joint Parsing of Syntactic and Semantic Dependencies with a Latent Variable Model , 2013, CL.

[30]  Meishan Zhang,et al.  Joint POS Tagging and Dependence Parsing With Transition-Based Neural Networks , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[31]  R'emi Louf,et al.  HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.

[32]  Quoc V. Le,et al.  Grounded Compositional Semantics for Finding and Describing Images with Sentences , 2014, TACL.

[33]  James Cross,et al.  Incremental Parsing with Minimal Features Using Bi-Directional LSTM , 2016, ACL.

[34]  Joakim Nivre,et al.  Incrementality in Deterministic Dependency Parsing , 2004 .

[35]  Noah A. Smith,et al.  Transition-Based Dependency Parsing with Stack Long Short-Term Memory , 2015, ACL.

[36]  Dan Klein,et al.  Feature-Rich Part-of-Speech Tagging with a Cyclic Dependency Network , 2003, NAACL.

[37]  Christopher D. Manning,et al.  A Structural Probe for Finding Syntax in Word Representations , 2019, NAACL.

[38]  Slav Petrov,et al.  Structured Training for Neural Network Transition-Based Parsing , 2015, ACL.

[39]  Fei Xia,et al.  Unsupervised Dependency Parsing with Transferring Distribution via Parallel Guidance and Entropy Regularization , 2014, ACL.

[40]  Joakim Nivre,et al.  Old School vs. New School: Comparing Transition-Based Parsers with and without Neural Network Enhancement , 2017, TLT.

[41]  Timothy Dozat,et al.  Deep Biaffine Attention for Neural Dependency Parsing , 2016, ICLR.

[42]  Jingzhou Liu,et al.  Stack-Pointer Networks for Dependency Parsing , 2018, ACL.

[43]  Joakim Nivre,et al.  Universal Dependency Annotation for Multilingual Parsing , 2013, ACL.

[44]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[45]  Christopher D. Manning,et al.  Leveraging Linguistic Structure For Open Domain Information Extraction , 2015, ACL.

[46]  Khalil Sima'an,et al.  Graph Convolutional Encoders for Syntax-aware Neural Machine Translation , 2017, EMNLP.

[47]  Eliyahu Kiperwasser,et al.  Simple and Accurate Dependency Parsing Using Bidirectional LSTM Feature Representations , 2016, TACL.

[48]  Jeffrey Pennington,et al.  Dynamic Pooling and Unfolding Recursive Autoencoders for Paraphrase Detection , 2011, NIPS.

[49]  Martin Wattenberg,et al.  Visualizing and Measuring the Geometry of BERT , 2019, NeurIPS.

[50]  Joakim Nivre,et al.  Transition-based Dependency Parsing with Rich Non-local Features , 2011, ACL.

[51]  Noah A. Smith,et al.  Training with Exploration Improves a Greedy Stack LSTM Parser , 2016, EMNLP.

[52]  Nanyun Peng,et al.  Cross-Sentence N-ary Relation Extraction with Graph LSTMs , 2017, TACL.

[53]  Dan Klein,et al.  Multilingual Constituency Parsing with Self-Attention and Pre-Training , 2018, ACL.

[54]  Joakim Nivre,et al.  Analyzing and Integrating Dependency Parsers , 2011, CL.

[55]  Daniel Kondratyuk,et al.  75 Languages, 1 Model: Parsing Universal Dependencies Universally , 2019, EMNLP.

[56]  Chiranjib Bhattacharyya,et al.  Incorporating Syntactic and Semantic Information in Word Embeddings using Graph Convolutional Networks , 2018, ACL.

[57]  Ivan Titov,et al.  A Latent Variable Model for Generative Dependency Parsing , 2007, Trends in Parsing Technology.

[58]  Dacheng Tao,et al.  Recurrent Graph Syntax Encoder for Neural Machine Translation , 2019, ArXiv.

[59]  Wilson L. Taylor,et al.  “Cloze Procedure”: A New Tool for Measuring Readability , 1953 .