Multilingual Lexicalized Constituency Parsing with Word-Level Auxiliary Tasks

We introduce a constituency parser based on a bi-LSTM encoder adapted from re- cent work (Cross and Huang, 2016b; Kiperwasser and Goldberg, 2016), which can incorporate a lower level character bi- LSTM (Ballesteros et al., 2015; Plank et al., 2016). We model two important in- terfaces of constituency parsing with aux- iliary tasks supervised at the word level: (i) part-of-speech (POS) and morpholog- ical tagging, (ii) functional label predic- tion. On the SPMRL dataset, our parser obtains above state-of-the-art results on constituency parsing without requiring ei- ther predicted POS or morphological tags, and outputs labelled dependency trees.

[1]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[2]  Daniel Jurafsky,et al.  Parsing to Stanford Dependencies: Trade-offs between Speed and Accuracy , 2010, LREC.

[3]  Alon Lavie,et al.  A Classifier-Based Parser with Linear Run-Time Complexity , 2005, IWPT.

[4]  Benoît Crabbé,et al.  Multilingual discriminative lexicalized phrase structure parsing , 2015, EMNLP.

[5]  Hinrich Schütze,et al.  Efficient Higher-Order CRFs for Morphological Tagging , 2013, EMNLP.

[6]  Noah A. Smith,et al.  Improved Transition-based Parsing by Modeling Characters instead of Words with LSTMs , 2015, EMNLP.

[7]  James Cross,et al.  Incremental Parsing with Minimal Features Using Bi-Directional LSTM , 2016, ACL.

[8]  Noah A. Smith,et al.  Transition-Based Dependency Parsing with Stack Long Short-Term Memory , 2015, ACL.

[9]  Alon Lavie,et al.  A Best-First Probabilistic Shift-Reduce Parser , 2006, ACL.

[10]  Agnieszka Falenska,et al.  Introducing the IMS-Wrocław-Szeged-CIS entry at the SPMRL 2014 Shared Task: Reranking and Morpho-syntax meet Unlabeled Data , 2014 .

[11]  Nizar Habash,et al.  Overview of the SPMRL 2013 Shared Task: A Cross-Framework Evaluation of Parsing Morphologically Rich Languages , 2013, SPMRL@EMNLP.

[12]  Wolfgang Seeker,et al.  (Re)ranking Meets Morphosyntax: State-of-the-art Results from the SPMRL 2013 Shared Task , 2013, SPMRL@EMNLP.

[13]  Barbara Plank,et al.  Multilingual Part-of-Speech Tagging with Bidirectional Long Short-Term Memory Models and Auxiliary Loss , 2016, ACL.

[14]  Boris Polyak,et al.  Acceleration of stochastic approximation by averaging , 1992 .

[15]  James Cross,et al.  Span-Based Constituency Parsing with a Structure-Label System and Provably Optimal Dynamic Oracles , 2016, EMNLP.

[16]  Eliyahu Kiperwasser,et al.  Simple and Accurate Dependency Parsing Using Bidirectional LSTM Feature Representations , 2016, TACL.