Deep Lexical Segmentation and Syntactic Parsing in the Easy-First Dependency Framework

We explore the consequences of representing token segmentations as hierarchical structures (trees) for the task of Multiword Expression (MWE) recognition, in isolation or in combination with dependency parsing. We propose a novel representation of token segmentation as trees on tokens, resembling dependency trees. Given this new representation, we present and evaluate two different architectures to combine MWE recognition and dependency parsing in the easy-first framework: a pipeline and a joint system, both taking advantage of lexical and syntactic dimensions. We experimentally validate that MWE recognition significantly helps syntactic parsing.

[1]  Marie Candito,et al.  Strategies for Contiguous Multiword Expression Analysis and Dependency Parsing , 2014, ACL.

[2]  Marie Candito,et al.  Le corpus Sequoia : annotation syntaxique et exploitation pour l’adaptation d’analyseur par pont lexical (The Sequoia Corpus : Syntactic Annotation and Use for a Parser Lexical Domain Adaptation Method) [in French] , 2012, JEP/TALN/RECITAL.

[3]  Alexandra Kinyon,et al.  Building a Treebank for French , 2000, LREC.

[4]  Nizar Habash,et al.  Overview of the SPMRL 2013 Shared Task: A Cross-Framework Evaluation of Parsing Morphologically Rich Languages , 2013, SPMRL@EMNLP.

[5]  Eduard Bejček,et al.  Annotation of multiword expressions in the Prague dependency treebank , 2010, IJCNLP.

[6]  Patrick Watrin,et al.  Discriminative Strategies to Integrate Multiword Expression Recognition and Parsing , 2012, ACL.

[7]  Timothy Baldwin,et al.  Multiword Expressions: A Pain in the Neck for NLP , 2002, CICLing.

[8]  Joakim Nivre,et al.  Training Deterministic Parsers with Non-Deterministic Oracles , 2013, TACL.

[9]  Eric Wehrli,et al.  The Relevance of Collocations for Parsing , 2014, MWE@EACL.

[10]  Adam Przepiórkowski,et al.  PARSEME – PARSing and Multiword Expressions within a European multilingual network , 2015 .

[11]  I. Sag,et al.  Idioms , 2015 .

[12]  Noah A. Smith,et al.  Comprehensive Annotation of Multiword Expressions in a Social Web Corpus , 2014, LREC.

[13]  Joseph Le Roux,et al.  Syntactic Parsing and Compound Recognition via Dual Decomposition: Application to French , 2014, COLING.

[14]  Noah A. Smith,et al.  Discriminative Lexical Semantic Segmentation with Gaps: Running the MWE Gamut , 2014, TACL.

[15]  Veronika Vincze,et al.  Dependency Parsing for Identifying Hungarian Light Verb Constructions , 2013, IJCNLP.

[16]  Matthieu Constant,et al.  Dependency Representations for Lexical Segmentation , 2015 .

[17]  Ozan Arkan Can,et al.  Multiword Expressions in Statistical Dependency Parsing , 2011, SPMRL@IWPT.

[18]  Joakim Nivre,et al.  Multiword Units in Syntactic Parsing , 2004 .

[19]  Yoav Goldberg,et al.  An Efficient Algorithm for Easy-First Non-Directional Dependency Parsing , 2010, NAACL.