Semantic reranking of CRF label sequences for verbal multiword expression identification

Verbal multiword Expressions (VMWE) identification can be addressed successfully as a sequence labelling problem via conditional random fields (CRFs) by returning the one label sequence with maximal probability. This work describes a system that reranks the top 10 most likely CRF candidate VMWE sequences using a decision tree regression model. The reranker aims to operationalise the intuition that a non-compositional MWE can have a different distributional behaviour than

[1]  J. R. Quinlan Learning With Continuous Classes , 1992 .

[2]  Timothy Baldwin,et al.  Detecting Non-compositional MWE Components using Wiktionary , 2014, EMNLP.

[3]  Noah A. Smith,et al.  UW-CSE at SemEval-2016 Task 10: Detecting Multiword Expressions and Supersenses using Double-Chained Conditional Random Fields , 2016, SemEval@NAACL-HLT.

[4]  Ronan Collobert,et al.  Phrase Representations for Multiword Expressions , 2016, MWE@ACL.

[5]  Behrang Q. Zadeh,et al.  The PARSEME Shared Task on Automatic Identification of Verbal Multiword Expressions , 2017, MWE@EACL.

[6]  Behrang QasemiZadeh,et al.  Analysis and Insights from the PARSEME Shared Task dataset , 2018 .

[7]  Xiao Sun,et al.  Mining Semantic Orientation of Multiword Expression from Chinese Microblogging with Discriminative Latent Model , 2013, 2013 International Conference on Asian Language Processing.

[8]  Ari Rappoport,et al.  Multi-Word Expression Identification Using Sentence Surface Features , 2009, EMNLP.

[9]  Timothy Baldwin,et al.  Using Distributional Similarity of Multi-way Translations to Predict Multiword Expression Compositionality , 2014, EACL.

[10]  Martin Emms,et al.  Measuring the Compositionality of Collocations via Word Co-occurrence Vectors: Shared Task System Description , 2011 .

[11]  Philipp Koehn,et al.  Europarl: A Parallel Corpus for Statistical Machine Translation , 2005, MTSUMMIT.

[12]  Carl Vogel,et al.  Detection of Verbal Multi-Word Expressions via Conditional Random Fields with Syntactic Dependency Features and Semantic Re-Ranking , 2017, MWE@EACL.

[13]  John Sinclair,et al.  Corpus, Concordance, Collocation , 1991 .

[14]  Christopher D. Manning,et al.  Multiword Expression Identification with Tree Substitution Grammars: A Parsing tour de force with French , 2011, EMNLP.

[15]  Hinrich Schütze,et al.  Automatic Word Sense Discrimination , 1998, Comput. Linguistics.

[16]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[17]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[18]  Timothy Baldwin,et al.  Multiword Expressions: A Pain in the Neck for NLP , 2002, CICLing.

[19]  Noah A. Smith,et al.  Discriminative Lexical Semantic Segmentation with Gaps: Running the MWE Gamut , 2014, TACL.

[20]  Josef van Genabith,et al.  Automatic Extraction of Arabic Multiword Expressions , 2010, MWE@COLING.

[21]  Veronika Vincze,et al.  Multiword Expressions and Named Entities in the Wiki50 Corpus , 2011, RANLP.

[22]  Patrick Watrin,et al.  Discriminative Strategies to Integrate Multiword Expression Recognition and Parsing , 2012, ACL.

[23]  Lidia S. Chao,et al.  Chinese Named Entity Recognition with Conditional Random Fields in the Light of Chinese Characteristics , 2013, IIS.

[24]  Xiaodong Zeng,et al.  Chinese Named Entity Recognition with Graph-based Semi-supervised Learning Model , 2015, SIGHAN@IJCNLP.

[25]  Timothy Baldwin,et al.  A Word Embedding Approach to Predicting the Compositionality of Multiword Expressions , 2015, NAACL.

[26]  Yulia Tsvetkov,et al.  Extraction of Multi-word Expressions from Small Parallel Corpora , 2010, COLING.

[27]  Aravind K. Joshi,et al.  Using Information about Multi-word Expressions for the Word-Alignment Task , 2006 .

[28]  Mona T. Diab,et al.  Arabic Multiword Expressions , 2014, Language, Culture, Computation.

[29]  Christopher D. Manning,et al.  Parsing Models for Identifying Multiword Expressions , 2013, CL.

[30]  Timothy Baldwin,et al.  Multilingual Deep Lexical Acquisition for HPSGs via Supertagging , 2006, EMNLP.