Automatic extraction of Word Combinations from corpora: Evaluating methods and benchmarks

English. We report on three experiments aimed at comparing two popular methods for the automatic extraction of Word Combinations from corpora, with a view to evaluate: i) their efficacy in acquiring data to be included in a combinatory resource for Italian; ii) the impact of different types of benchmarks on the evaluation itself. Italiano. Presentiamo i risultati di tre esperimenti che mirano a confrontare due metodi di estrazione automatica di combinazioni di parole da corpora, con lo scopo di: (i) valutare l’efficacia dei due metodi per acquisire dati da includere in una risorsa combinatoria per l’italiano, e (ii) analizzare e confrontare i metodi di valutazione stessi.

[1]  Alessandro Lenci,et al.  Extracting Terms with EXTra , 2016 .

[2]  Felice Dell'Orletta,et al.  Reverse Revision and Linear Tree Combination for Dependency Parsing , 2009, HLT-NAACL.

[3]  Malvina Nissim,et al.  SYMPAThy: Towards a comprehensive approach to the extraction of Italian Word Combinations , 2014 .

[4]  S. Gries 1. Phraseology and linguistic theory: A brief survey , 2008 .

[5]  S. Gries Phraseology and linguistic theory : a brief survey , 2007 .

[6]  Guy Aston,et al.  Introducing the La Repubblica Corpus: A Large, Annotated, TEI(XML)-compliant Corpus of Newspaper Italian , 2004, LREC.

[7]  Malvina Nissim,et al.  Pos-Patterns or Syntax? Comparing Methods for Extracting Word Combinations , 2016 .

[8]  Felice Dell'Orletta,et al.  Ensemble system for Part-of-Speech tagging , 2009 .

[9]  Violeta Seretan Syntax-Based Collocation Extraction , 2010 .

[10]  Vincenzo Lo Cascio Dizionario Combinatorio Italiano , 2013 .

[11]  Carlos Ramisch,et al.  mwetoolkit: a Framework for Multiword Expression Identification , 2010, LREC.

[12]  Ralph Grishman,et al.  Towards Best Practice for Multiword Expressions in Computational Lexicons , 2002, LREC.

[13]  Alessandro Lenci,et al.  LexIt: A Computational Resource on Italian Argument Structure , 2012, LREC.

[14]  Malvina Nissim,et al.  Mapping the constructicon with SYMPAThy. Italian word combinations between fixedness and productivity , 2015, NetWordS.

[15]  Carlos Ramisch,et al.  Validation and Evaluation of Automatically Acquired Multiword Expressions for Grammar Engineering , 2007, EMNLP.

[16]  Timothy Baldwin,et al.  Multiword Expressions: A Pain in the Neck for NLP , 2002, CICLing.