Joint Dependency Parsing and Multiword Expression Tokenization

Complex conjunctions and determiners are often considered as pretokenized units in parsing. This is not always realistic, since they can be ambiguous. We propose a model for joint dependency parsing and multiword expressions identification, in which complex function words are represented as individual tokens linked with morphological dependencies. Our graph-based parser includes standard second-order features and verbal subcategoriza-tion features derived from a syntactic lexicon .We train it on a modified version of the French Treebank enriched with morphological dependencies. It recognizes 81.79% of ADV+que conjunctions with 91.57% precision, and 82.74% of de+DET determiners with 86.70% precision.

[1]  Veronika Vincze,et al.  Dependency Parsing for Identifying Hungarian Light Verb Constructions , 2013, IJCNLP.

[2]  Christopher D. Manning,et al.  Better Arabic Parsing: Baselines, Evaluations, and Analysis , 2010, COLING.

[3]  Marie Candito,et al.  Strategies for Contiguous Multiword Expression Analysis and Dependency Parsing , 2014, ACL.

[4]  P. Resnik Treebanks : Building and Using Parsed Corpora , 2022 .

[5]  Pascal Denis,et al.  Analyse syntaxique du français : des constituants aux dépendances , 2009 .

[6]  Yoav Goldberg,et al.  Joint Hebrew Segmentation and Parsing using a PCFGLA Lattice Parser , 2011, ACL.

[7]  Alexandra Kinyon,et al.  Building a Treebank for French , 2000, LREC.

[8]  Noah A. Smith,et al.  Discriminative Lexical Semantic Segmentation with Gaps: Running the MWE Gamut , 2014, TACL.

[9]  Joakim Nivre,et al.  Multiword Units in Syntactic Parsing , 2004 .

[10]  Ioannis Korkontzelos,et al.  Can Recognising Multiword Expressions Improve Shallow Parsing? , 2010, HLT-NAACL.

[11]  Piet Mertens,et al.  La valence: l'approche pronominale et son application au lexique verbal , 2003 .

[12]  Veronika Vincze,et al.  Identifying English and Hungarian Light Verb Constructions: A Contrastive Approach , 2013, ACL.

[13]  Carlos Ramisch,et al.  Validation and Evaluation of Automatically Acquired Multiword Expressions for Grammar Engineering , 2007, EMNLP.

[14]  Marie Candito,et al.  The LIGM-Alpage architecture for the SPMRL 2013 Shared Task: Multiword Expression Analysis and Dependency Parsing , 2013, SPMRL@EMNLP.

[15]  Eric Wehrli,et al.  Sentence Analysis and Collocation Identification , 2010, MWE@COLING.

[16]  Igor Mel’čuk,et al.  Dependency Syntax: Theory and Practice , 1987 .

[17]  Noah A. Smith,et al.  Dependency Parsing , 2009, Encyclopedia of Artificial Intelligence.

[18]  Timothy Baldwin,et al.  Multiword Expressions: A Pain in the Neck for NLP , 2002, CICLing.

[19]  António Horta Branco,et al.  Contractions: Breaking the Tokenization-Tagging Circularity , 2003, PROPOR.

[20]  Christopher D. Manning,et al.  Parsing Models for Identifying Multiword Expressions , 2013, CL.

[21]  Mark A. Finlayson,et al.  jMWE: A Java Toolkit for Detecting Multi-Word Expressions , 2011, MWE@ACL.

[22]  Éric Villemonte de la Clergerie,et al.  Exploring beam-based shift-reduce dependency parsing with DyALog: Results from the SPMRL 2013 shared task , 2013, SPMRL@EMNLP.

[23]  Philipp Koehn,et al.  Synthesis Lectures on Human Language Technologies , 2016 .

[24]  Timothy Baldwin,et al.  Multiword Expressions , 2010, Handbook of Natural Language Processing.

[25]  Bernd Bohnet,et al.  Top Accuracy and Fast Dependency Parsing is not a Contradiction , 2010, COLING.

[26]  Joseph Le Roux,et al.  Combining compound recognition and PCFG-LA parsing with word lattices and conditional random fields , 2013, TSLP.

[27]  Veronika Vincze,et al.  Learning to detect english and hungarian light verb constructions , 2013, TSLP.

[28]  Matthieu Constant,et al.  MWU-Aware Part-of-Speech Tagging with a CRF Model and Lexical Resources , 2011, MWE@ACL.