PARSEME Survey on MWE Resources

This paper summarizes the first results of an ongoing survey on multiword resources carried out within the IC1207 Cost Action PARSEME (PARSing and Multi-word Expressions). Despite the availability of language resource catalogues and the inventory of multiword data-sets available at the SIGLEX-MWE website, multiword resources are scattered and prove to be difficult to be found. In many cases, language resources such as corpora, treebanks or lexical databases include multiwords as part of their data or take them into consideration in their annotations. However, it is needed to centralize these resources so that other researches may subsequently use them. The final aim of this survey is thus to create a portal where researchers may find multiword resources or multiword-aware language resources for their research. We report on how the survey was designed and analyze the data gathered so far. We also discuss the problems we have detected upon examination of the data and possible ways of enhancing the survey.

[1]  Simone Paolo Ponzetto,et al.  BabelNet: The automatic construction, evaluation and application of a wide-coverage multilingual semantic network , 2012, Artif. Intell..

[2]  Lars Borin,et al.  Metadata descriptions and other interoperability standards , 2011 .

[3]  Eduard Bejček,et al.  Annotation of multiword expressions in the Prague dependency treebank , 2010, IJCNLP.

[4]  Alon Itai,et al.  A Hebrew verb–complement dictionary , 2014, Lang. Resour. Evaluation.

[5]  Valeria Quochi,et al.  A MWE Acquisition and Lexicon Builder Web Service , 2012, COLING.

[6]  Agata Savary,et al.  SEJFEK - a Lexicon and a Shallow Grammar of Polish Economic Multi-Word Units , 2012 .

[7]  Adam Przepiórkowski,et al.  A survey of multiword expressions in treebanks , 2015 .

[8]  Timothy Baldwin,et al.  Multiword Expressions: A Pain in the Neck for NLP , 2002, CICLing.

[9]  Bruno Pouliquen,et al.  JRC-NAMES: A Freely Available, Highly Multilingual Named Entity Resource , 2011, RANLP.

[10]  Markéta Lopatková,et al.  Valency Information in VALLEX 2.0: Logical Structure of the Lexicon , 2007, Prague Bull. Math. Linguistics.

[11]  Menzo Windhouwer,et al.  CMDI: a Component Metadata Infrastructur , 2012, LREC 2012.

[12]  Cvetana Krstev,et al.  An Approach to Efficient Processing of Multi-word Units , 2013, Computational Linguistics - Applications.

[13]  Marcin Wolinski,et al.  Inflection of Polish Multi-Word Proper Names with Morfeusz and Multiflex , 2009, Aspects of Natural Language Processing.

[14]  Agata Savary,et al.  SEJF - A Grammatical Lexicon of Polish Multiword Expressions , 2015, LTC.

[15]  Jan Odijk,et al.  Identification and Lexical Representation of Multiword Expressions , 2013, Essential Speech and Language Technology for Dutch.

[16]  Ken Litkowski,et al.  Pattern Dictionary of English Prepositions , 2014, ACL.

[17]  Markus Forsberg,et al.  SALDO: a touch of yin to WordNet’s yang , 2013, Lang. Resour. Evaluation.

[18]  Adam Przepiórkowski,et al.  PARSEME – PARSing and Multiword Expressions within a European multilingual network , 2015 .

[19]  Marie Mikulová,et al.  PDT-Vallex: Czech Valency lexicon linked to treebanks , 2014 .