Deep Contextualized Word Embeddings in Transition-Based and Graph-Based Dependency Parsing - A Tale of Two Parsers Revisited

Transition-based and graph-based dependency parsers have previously been shown to have complementary strengths and weaknesses: transition-based parsers exploit rich structural features but suffer from error propagation, while graph-based parsers benefit from global optimization but have restricted feature scope. In this paper, we show that, even though some details of the picture have changed after the switch to neural networks and continuous representations, the basic trade-off between rich features and global optimization remains essentially the same. Moreover, we show that deep contextualized word embeddings, which allow parsers to pack information about global sentence structure into local feature representations, benefit transition-based parsers more than graph-based parsers, making the two approaches virtually equivalent in terms of both accuracy and error profile. We argue that the reason is that these representations help prevent search errors and thereby allow transition-based parsers to better exploit their inherent strength of making accurate local decisions. We support this explanation by an error analysis of parsing experiments on 13 languages.

[1]  Joakim Nivre,et al.  Arc-Hybrid Non-Projective Dependency Parsing with a Static-Dynamic Oracle , 2017, IWPT.

[2]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[3]  Joakim Nivre,et al.  An Efficient Algorithm for Projective Dependency Parsing , 2003, IWPT.

[4]  Fernando Pereira,et al.  Non-Projective Dependency Parsing using Spanning Tree Algorithms , 2005, HLT.

[5]  Joakim Nivre,et al.  An Investigation of the Interactions Between Pre-Trained Word Embeddings, Character Models and POS Tags in Dependency Parsing , 2018, EMNLP.

[6]  Joakim Nivre,et al.  Characterizing the Errors of Data-Driven Dependency Parsing Models , 2007, EMNLP.

[7]  John D. Lafferty,et al.  Towards History-based Grammars: Using Richer Models for Probabilistic Parsing , 1993, ACL.

[8]  Timothy Dozat,et al.  Universal Dependency Parsing from Scratch , 2019, CoNLL.

[9]  Jonas Kuhn,et al.  The Best of Both Worlds – A Graph-based Completion Model for Transition-based Parsers , 2012, EACL.

[10]  Joakim Nivre,et al.  A Dynamic Oracle for Arc-Eager Dependency Parsing , 2012, COLING.

[11]  Joakim Nivre,et al.  Algorithms for Deterministic Incremental Dependency Parsing , 2008, CL.

[12]  Felice Dell'Orletta,et al.  Accurate Dependency Parsing with a Stacked Multilayer Perceptron , 2009 .

[13]  Yonatan Belinkov,et al.  Linguistic Knowledge and Transferability of Contextual Representations , 2019, NAACL.

[14]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[15]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[16]  Joakim Nivre,et al.  82 Treebanks, 34 Models: Universal Dependency Parsing with Multi-Treebank Models , 2018, CoNLL.

[17]  Sabine Buchholz,et al.  CoNLL-X Shared Task on Multilingual Dependency Parsing , 2006, CoNLL.

[18]  Noah A. Smith,et al.  Improved Transition-based Parsing by Modeling Characters instead of Words with LSTMs , 2015, EMNLP.

[19]  Tomas Mikolov,et al.  Enriching Word Vectors with Subword Information , 2016, TACL.

[20]  Walter Daelemans,et al.  A Memory-Based Alternative for Connectionist Shift-Reduce Parsing , 2000 .

[21]  Hao Zhang,et al.  Generalized Higher-Order Dependency Parsing with Cube Pruning , 2012, EMNLP.

[22]  Slav Petrov,et al.  Globally Normalized Transition-Based Neural Networks , 2016, ACL.

[23]  Giorgio Satta,et al.  Dynamic Programming Algorithms for Transition-Based Dependency Parsers , 2011, ACL.

[24]  Kenji Sagae,et al.  Dynamic Programming for Linear-Time Incremental Parsing , 2010, ACL.

[25]  Jason Eisner,et al.  Three New Probabilistic Models for Dependency Parsing: An Exploration , 1996, COLING.

[26]  Joakim Nivre,et al.  Pseudo-Projective Dependency Parsing , 2005, ACL.

[27]  Slav Petrov,et al.  Structured Training for Neural Network Transition-Based Parsing , 2015, ACL.

[28]  Christopher D. Manning,et al.  A Structural Probe for Finding Syntax in Word Representations , 2019, NAACL.

[29]  Alon Lavie,et al.  Parser Combination by Reparsing , 2006, NAACL.

[30]  Ivan Titov,et al.  A Latent Variable Model for Generative Dependency Parsing , 2007, Trends in Parsing Technology.

[31]  Xavier Carreras,et al.  Experiments with a Higher-Order Projective Dependency Parser , 2007, EMNLP.

[32]  Omer Levy,et al.  GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding , 2018, BlackboxNLP@EMNLP.

[33]  George Kurian,et al.  Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.

[34]  Timothy Dozat,et al.  Stanford’s Graph-based Neural Dependency Parser at the CoNLL 2017 Shared Task , 2017, CoNLL.

[35]  Regina Barzilay,et al.  Cross-Lingual Alignment of Contextual Word Embeddings, with Applications to Zero-shot Dependency Parsing , 2019, NAACL.

[36]  Yoav Goldberg,et al.  Assessing BERT's Syntactic Abilities , 2019, ArXiv.

[37]  Yoshua Bengio,et al.  Gradient Flow in Recurrent Nets: the Difficulty of Learning Long-Term Dependencies , 2001 .

[38]  Joakim Nivre,et al.  Analyzing the Effect of Global Learning and Beam-Search on Transition-Based Dependency Parsing , 2012, COLING.

[39]  Richard Johansson,et al.  Incremental Dependency Parsing Using Online Learning , 2007, EMNLP-CoNLL.

[40]  Yuji Matsumoto,et al.  Statistical Dependency Analysis with Support Vector Machines , 2003, IWPT.

[41]  Eliyahu Kiperwasser,et al.  Simple and Accurate Dependency Parsing Using Bidirectional LSTM Feature Representations , 2016, TACL.

[42]  Alexander M. Rush,et al.  Dual Decomposition for Parsing with Non-Projective Head Automata , 2010, EMNLP.

[43]  Joakim Nivre,et al.  Non-Projective Dependency Parsing in Expected Linear Time , 2009, ACL.

[44]  Joakim Nivre,et al.  Integrating Graph-Based and Transition-Based Dependency Parsers , 2008, ACL.

[45]  Stephen Clark,et al.  A Tale of Two Parsers: Investigating and Combining Graph-based and Transition-based Dependency Parsing , 2008, EMNLP.

[46]  Noah A. Smith,et al.  Transition-Based Dependency Parsing with Stack Long Short-Term Memory , 2015, ACL.

[47]  Fernando Pereira,et al.  Online Learning of Approximate Dependency Parsing Algorithms , 2006, EACL.

[48]  Martin Potthast,et al.  CoNLL 2018 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies , 2018, CoNLL.

[49]  Benjamin Müller,et al.  ELMoLex: Connecting ELMo and Lexicon Features for Dependency Parsing , 2018, CoNLL Shared Task.

[50]  Jonas Kuhn,et al.  The (Non-)Utility of Structural Features in BiLSTM-based Dependency Parsers , 2019, ACL.

[51]  Yijia Liu,et al.  Towards Better UD Parsing: Deep Contextualized Word Embeddings, Ensemble, and Treebank Concatenation , 2018, CoNLL.

[52]  Alex Wang,et al.  What do you learn from context? Probing for sentence structure in contextualized word representations , 2019, ICLR.

[53]  Joakim Nivre,et al.  Labeled Pseudo-Projective Dependency Parsing with Support Vector Machines , 2006, CoNLL.

[54]  Joakim Nivre,et al.  Training Deterministic Parsers with Non-Deterministic Oracles , 2013, TACL.

[55]  David Gil,et al.  The World Atlas of Language Structures , 2005 .

[56]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[57]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[58]  Danqi Chen,et al.  A Fast and Accurate Dependency Parser using Neural Networks , 2014, EMNLP.

[59]  Sebastian Riedel,et al.  The CoNLL 2007 Shared Task on Dependency Parsing , 2007, EMNLP.

[60]  Changki Lee,et al.  SEx BiST: A Multi-Source Trainable Parser with Deep Contextualized Lexical Representations , 2018, CoNLL Shared Task.

[61]  Prakhar Gupta,et al.  Learning Word Vectors for 157 Languages , 2018, LREC.

[62]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[63]  Fernando Pereira,et al.  Multilingual Dependency Analysis with a Two-Stage Discriminative Parser , 2006, CoNLL.

[64]  Christof Monz,et al.  The Importance of Being Recurrent for Modeling Hierarchical Structure , 2018, EMNLP.

[65]  Joakim Nivre,et al.  From Raw Text to Universal Dependencies - Look, No Tags! , 2017, CoNLL.

[66]  Joakim Nivre,et al.  Old School vs. New School: Comparing Transition-Based Parsers with and without Neural Network Enhancement , 2017, TLT.

[67]  Timothy Dozat,et al.  Deep Biaffine Attention for Neural Dependency Parsing , 2016, ICLR.

[68]  Sarah Armstrong,et al.  How do you Learn , 2016 .

[69]  Joakim Nivre,et al.  Analyzing and Integrating Dependency Parsers , 2011, CL.

[70]  Koby Crammer,et al.  Online Large-Margin Training of Dependency Parsers , 2005, ACL.

[71]  Daniel Kondratyuk,et al.  75 Languages, 1 Model: Parsing Universal Dependencies Universally , 2019, EMNLP.