A Generic Framework for Multiword Expressions Treatment: from Acquisition to Applications

This paper presents an open and flexible methodological framework for the automatic acquisition of multiword expressions (MWEs) from monolingual textual corpora. This research is motivated by the importance of MWEs for NLP applications. After briefly presenting the modules of the framework, the paper reports extrinsic evaluation results considering two applications: computer-aided lexicography and statistical machine translation. Both applications can benefit from automatic MWE acquisition and the expressions acquired automatically from corpora can both speed up and improve their quality. The promising results of previous and ongoing experiments encourage further investigation about the optimal way to integrate MWE treatment into these and many other applications.

[1]  Carlos Ramisch,et al.  A Comparable Corpus Based on Aligned Multilingual Ontologies , 2012, ACL 2012.

[2]  Andrea Esuli,et al.  SENTIWORDNET: A Publicly Available Lexical Resource for Opinion Mining , 2006, LREC.

[3]  Ulrich Heid,et al.  A Survey of Idiomatic Preposition-Noun-Verb Triples on Token Level , 2010, LREC.

[4]  Fatiha Sadat,et al.  An Approach Based on Multilingual Thesauri and Model Combination for Bilingual Lexicon Extraction , 2002, COLING.

[5]  D. Biber,et al.  Longman Grammar of Spoken and Written English , 1999 .

[6]  Timothy Baldwin,et al.  Standardised Evaluation of English Noun Compound Interpretation , 2008 .

[7]  Ted Briscoe,et al.  A System for Large-Scale Acquisition of Verbal, Nominal and Adjectival Subcategorization Frames from Corpora , 2007, ACL.

[8]  Anabela Barreiro,et al.  ReEscreve: a translator-friendly multi-purpose paraphrasing software tool , 2009 .

[9]  Kevin Knight,et al.  Decoding Complexity in Word-Replacement Translation Models , 1999, Comput. Linguistics.

[10]  Lillian Lee,et al.  Opinion Mining and Sentiment Analysis , 2008, Found. Trends Inf. Retr..

[11]  Eric Wehrli,et al.  Multilingual collocation extraction with a syntactic parser , 2009, Lang. Resour. Evaluation.

[12]  Yaacov Choueka,et al.  Looking for Needles in a Haystack or Locating Interesting Collocational Expressions in Large Textual Databases , 1988, RIAO Conference.

[13]  Ulrich Heid,et al.  Tools for Collocation Extraction: Preferences for Active vs. Passive , 2008, LREC.

[14]  R. Mahesh K. Sinha Mining Complex Predicates In Hindi Using A Parallel Hindi-English Corpus , 2009, MWE@IJCNLP.

[15]  Barbara Di Eugenio,et al.  Squibs and Discussions: The Kappa Statistic: A Second Look , 2004, CL.

[16]  Bento Carlos Dias da Silva,et al.  Brazilian portuguese wordNet: a computational linguistic exercise of encoding bilingual relational lexicons , 2010 .

[17]  Aravind Joshi Multiword Expressions as Discourse Relation Markers (DRMs) , 2010, MWE@COLING.

[18]  Juwon Lee Two Types of Korean Light Verb Constructions in a Typed Feature Structure Grammar , 2011, MWE@ACL.

[19]  Pavel Rychlý,et al.  Manatee, Bonito and Word Sketches for Czech , 2004 .

[20]  Randy Goebel,et al.  Application of the Tightness Continuum Measure to Chinese Information Retrieval , 2010, MWE@COLING.

[21]  Preslav Nakov,et al.  Search Engine Statistics Beyond the n-Gram: Application to Noun Compound Bracketing , 2005, CoNLL.

[22]  Min-Yen Kan,et al.  Re-examining Automatic Keyphrase Extraction Approaches in Scientific Articles , 2009, MWE@IJCNLP.

[23]  Gosse Bouma,et al.  Corpus-based Acquisition of Collocational Prepositional Phrases , 2001, CLIN.

[24]  Aravind K. Joshi,et al.  Using Information about Multi-word Expressions for the Word-Alignment Task , 2006 .

[25]  Timothy Baldwin,et al.  Bootstrapping Deep Lexical Resources: Resources for Courses , 2005, ACL 2005.

[26]  Sara Stymne Pre- and Postprocessing for Statistical Machine Translation into Germanic Languages , 2011, ACL.

[27]  Daniel Marcu,et al.  Statistical Phrase-Based Translation , 2003, NAACL.

[28]  Chris Callison-Burch,et al.  Demonstration of Joshua: An Open Source Toolkit for Parsing-based Machine Translation , 2009, ACL.

[29]  Dan I. Moldovan,et al.  On the semantics of noun compounds , 2005, Comput. Speech Lang..

[30]  Oliver Christ,et al.  A Modular and Flexible Architecture for an Integrated Corpus Query System , 1994, ArXiv.

[31]  Iris Hendrickx,et al.  Complex Predicates Annotation in a Corpus of Portuguese , 2010, Linguistic Annotation Workshop.

[32]  John Carroll,et al.  Detecting a Continuum of Compositionality in Phrasal Verbs , 2003, ACL 2003.

[33]  Darren Pearce,et al.  Synonymy in collocation extraction , 2001 .

[34]  Carlos Ramisch,et al.  Fast and Flexible MWE Candidate Generation with the mwetoolkit , 2011, MWE@ACL.

[35]  Igor Mel’čuk,et al.  Dictionnaire explicatif et combinatoire du français contemporain. Recherches lexico-sémantiques IV: Recherches lexico-sémantiques IV , 1999 .

[36]  Christian Boitet,et al.  IWSLT-06: experiments with commercial MT systems and lessons from subjective evaluations , 2006, IWSLT.

[37]  Mark Dras,et al.  Automatic Identification of Support Verbs: A Step Towards a Definition of Semantic Weight , 1995, ArXiv.

[38]  Matthieu Constant,et al.  MWU-Aware Part-of-Speech Tagging with a CRF Model and Lexical Resources , 2011, MWE@ACL.

[39]  Randy Goebel,et al.  Web-Scale N-gram Models for Lexical Disambiguation , 2009, IJCAI.

[40]  Stelios Piperidis,et al.  A Unified POS Tagging Architecture and its Application to Greek , 2000, LREC.

[41]  Slava M. Katz,et al.  Technical terminology: some linguistic properties and an algorithm for identification in text , 1995, Natural Language Engineering.

[42]  M. E. J. Newman,et al.  Power laws, Pareto distributions and Zipf's law , 2005 .

[43]  Bernard Vauquois,et al.  A survey of formal grammars and algorithms for recognition and transformation in mechanical translation , 1968, IFIP Congress.

[44]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[45]  Ted Briscoe,et al.  The Second Release of the RASP System , 2006, ACL.

[46]  Hermann Ney,et al.  Improved backing-off for M-gram language modeling , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[47]  Stefan Evert,et al.  Using small random samples for the manual evaluation of statistical association measures , 2005, Comput. Speech Lang..

[48]  Preslav Nakov,et al.  Large-Scale Noun Compound Interpretation Using Bootstrapping and the Web as a Corpus , 2011, EMNLP.

[49]  Stanley F. Chen,et al.  An empirical study of smoothing techniques for language modeling , 1999 .

[50]  Keh-Jiann Chen,et al.  Acquiring Translation Equivalences of Multiword Expressions by Normalized Correlation Frequencies , 2009, EMNLP.

[51]  Shailaja Venkatsubramanyan,et al.  Multiword expression filtering for building knowledge maps , 2004, ACL 2004.

[52]  Nicole Gregoire Design and Implementation of a Lexicon of Dutch Multiword Expressions , 2007 .

[53]  Colin Bannard A Measure of Syntactic Flexibility for Automatically Identifying Multiword Expressions in Corpora , 2007 .

[54]  Emmanuel Morin,et al.  Compositionality and lexical alignment of multi-word terms , 2010, Lang. Resour. Evaluation.

[55]  Hermann Ney,et al.  A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[56]  Xiaoyan Zhu,et al.  Measuring the Non-compositionality of Multiword Expressions , 2010, COLING.

[57]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[58]  Rosa Gil,et al.  Towards an Ontology for Describing Emotions , 2008, WSKS.

[59]  Jorge Baptista,et al.  Frozen Sentences of Portuguese: Formal Descriptions for NLP , 2004 .

[60]  Carlos Ramisch,et al.  How do you feel? Investigating lexical-syntactic patterns in sentiment expression , 2011 .

[61]  Marine Carpuat,et al.  Task-based Evaluation of Multiword Expressions: a Pilot Study in Statistical Machine Translation , 2010, NAACL.

[62]  Timothy Baldwin,et al.  Deep lexical acquisition of verb-particle constructions , 2005, Comput. Speech Lang..

[63]  James H. Martin,et al.  Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition, 2nd Edition , 2000, Prentice Hall series in artificial intelligence.

[64]  Andy Way,et al.  Handling Named Entities and Compound Verbs in Phrase-Based Statistical Machine Translation , 2010, MWE@COLING.

[65]  Jonas Kuhn,et al.  Exploiting Translational Correspondences for Pattern-Independent MWE Identification , 2009, MWE@IJCNLP.

[66]  Carlos Ramisch,et al.  Identifying and Analyzing Brazilian Portuguese Complex Predicates , 2011, MWE@ACL.

[67]  Dawn Archer,et al.  Extracting Multiword Expressions with A Semantic Tagger , 2003, ACL 2003.

[68]  Igor M. Boguslavsky,et al.  Lexical Functions as a Tool of ETAP-3 1 , 2003 .

[69]  I. Dan Melamed Automatic Discovery of Non-Compositional Compounds in Parallel Data , 1997, EMNLP.

[70]  Jin'ichi Murakami,et al.  Non-Compositional Language Model and Pattern Dictionary Development for Japanese Compound and Complex Sentences , 2008, COLING.

[71]  George R. Doddington,et al.  Automatic Evaluation of Machine Translation Quality Using N-gram Co-Occurrence Statistics , 2002 .

[72]  Darren Gergle,et al.  The language of emotion in short blog texts , 2008, CSCW.

[73]  Aravind K. Joshi,et al.  Detecting Compositionality of Verb-Object Combinations using Selectional Preferences , 2007, EMNLP-CoNLL.

[74]  Colin J. Bannard,et al.  Learning about the meaning of verb-particle constructions from corpora , 2005, Comput. Speech Lang..

[75]  Sebastian Sulger,et al.  Extracting and Classifying Urdu Multiword Expressions , 2011, ACL.

[76]  Suzanne Stevenson,et al.  Statistical Measures of the Semi-Productivity of Light Verb Constructions , 2004 .

[77]  Timothy Baldwin,et al.  A Statistical Approach to the Semantics of Verb-Particles , 2003, ACL 2003.

[78]  Morris Salkoff,et al.  Automatic translation of support verb constructions , 1990, COLING.

[79]  Mari Ostendorf,et al.  Cross-validation and aggregated EM training for robust parameter estimation , 2008, Comput. Speech Lang..

[80]  Simon Frantz,et al.  Adverse drug reactions: A broader perspective , 2004, Nature Reviews Drug Discovery.

[81]  Violeta Seretan,et al.  Collocation extraction based on syntactic parsing , 2008 .

[82]  Kenneth Ward Church,et al.  A Program for Aligning Sentences in Bilingual Corpora , 1993, CL.

[83]  Qun Liu,et al.  Improving Statistical Machine Translation Using Domain Bilingual Multiword Expressions , 2009, MWE@IJCNLP.

[84]  Kenneth Ward Church,et al.  Word Association Norms, Mutual Information, and Lexicography , 1989, ACL.

[85]  C. Fillmore,et al.  Regularity and Idiomaticity in Grammatical Constructions: The Case of Let Alone , 1988 .

[86]  Timothy Baldwin,et al.  Translation by Machine of Complex Nominals: Getting it Right , 2004 .

[87]  Eric Laporte,et al.  A French Corpus Annotated for Multiword Nouns , 2008, LREC 2008.

[88]  Antoine Doucet,et al.  Non-Contiguous Word Sequences for Information Retrieval , 2004 .

[89]  Jorge Baptista,et al.  Spanish Adverbial Frozen Expressions , 2007 .

[90]  Afsaneh Fazly,et al.  Child Acquisition of Multiword Verbs: A Computational Investigation , 2013, Cognitive Aspects of Computational Language Acquisition.

[91]  Carlos Ramisch,et al.  A Serious Lexical Game for Building a Portuguese Lexical-Semantic Network , 2012, ACL 2012.

[92]  Dipankar Das,et al.  Semantic Clustering: an Attempt to Identify Multiword Expressions in Bengali , 2011, MWE@ACL.

[93]  Adam Lopez,et al.  Statistical machine translation , 2008, AMTA.

[94]  Gaël Dias,et al.  Multiword Unit Hybrid Extraction , 2003, ACL 2003.

[95]  Timothy Baldwin,et al.  An Empirical Model of Multiword Expression Decomposability , 2003, ACL 2003.

[96]  Carlos Ramisch,et al.  Validation and Evaluation of Automatically Acquired Multiword Expressions for Grammar Engineering , 2007, EMNLP.

[97]  Pavel Pecina An Extensive Empirical Study of Collocation Extraction Methods , 2005, ACL.

[98]  Timothy Baldwin,et al.  MWEs and Topic Modelling: Enhancing Machine Learning with Linguistics , 2011, MWE@ACL.

[99]  Eric Wehrli,et al.  Sentence Analysis and Collocation Identification , 2010, MWE@COLING.

[100]  Ralph Grishman,et al.  Towards Best Practice for Multiword Expressions in Computational Lexicons , 2002, LREC.

[101]  Timothy Baldwin,et al.  How to pick out token instances of English verb-particle constructions , 2010, Lang. Resour. Evaluation.

[102]  Adam Kilgarriff,et al.  Introduction to the Special Issue on the Web as Corpus , 2003, CL.

[103]  Yi Hu,et al.  A Bio-Inspired Approach for Multi-Word Expression Extraction , 2006, ACL.

[104]  Simonetta Montemagni,et al.  A Contrastive Approach to Multi-word Extraction from Domain-specific Corpora , 2010, LREC.

[105]  Josef van Genabith,et al.  Automatic Extraction of Arabic Multiword Expressions , 2010, MWE@COLING.

[106]  Chu-Ren Huang,et al.  Chinese Sketch Engine and the Extraction of Grammatical Collocations , 2005, SIGHAN@IJCNLP 2005.

[107]  Yi Zhang,et al.  Automated Deep Lexical Acquisition for Robust Open Texts Processing , 2006, LREC.

[108]  Maria Lapata,et al.  The Disambiguation of Nominalizations , 2002, CL.

[109]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[110]  Barbara John A Thomas Lohse,et al.  Domain Minimization in English Verb-Particle Constructions , 2004 .

[111]  Carlos Ramisch,et al.  Towards the Construction of Language Resources for Greek Multiword Expressions: Extraction and Evaluation , 2010, LREC 2010.

[112]  R. Harald Baayen,et al.  Word Frequency Distributions , 2001 .

[113]  Daniel Jurafsky,et al.  Is Knowledge-Free Induction of Multiword Unit Dictionary Headwords a Solved Problem? , 2001, EMNLP.

[114]  Tim Van de Cruys,et al.  Semantics-based Multiword Expression Extraction , 2007 .

[115]  Simone Teufel,et al.  Corpus-based Method for Automatic Identification of Support Verbs for Nominalizations , 1995, EACL.

[116]  Amitabha Mukerjee,et al.  Detecting Complex Predicates in Hindi using POS Projection across Parallel Corpora , 2006 .

[117]  Carlos Ramisch,et al.  Alignment-based extraction of multiword expressions , 2010, Lang. Resour. Evaluation.

[118]  Ted Pedersen,et al.  Fishing for Exactness , 1996, ArXiv.

[119]  SmadjaFrank Retrieving collocations from text , 1993 .

[120]  Christopher D. Manning,et al.  Multiword Expression Identification with Tree Substitution Grammars: A Parsing tour de force with French , 2011, EMNLP.

[121]  Preslav Nakov,et al.  Solving Relational Similarity Problems Using the Web as a Corpus , 2008, ACL.

[122]  Preslav Nakov,et al.  SemEval-2010 Task 9: The Interpretation of Noun Compounds Using Paraphrasing Verbs and Prepositions , 2010, SemEval@ACL.

[123]  Driss Aboutajdine,et al.  A Multi-Word Term Extraction Program for Arabic Language , 2008, LREC.

[124]  José Gabriel Pereira Lopes,et al.  Using LocalMaxs Algorithm for the Extraction of Contiguous and Non-contiguous Multiword Lexical Units , 1999, EPIA.

[125]  Emmanuel Morin,et al.  French-English Multi-word Term Alignment Based on Lexical Context Analysis , 2004, LREC.

[126]  Timothy Baldwin,et al.  Disambiguating Japanese compound verbs , 2005, Comput. Speech Lang..

[127]  Nicole Grégoire,et al.  DuELME: a Dutch electronic lexicon of multiword expressions , 2010, Lang. Resour. Evaluation.

[128]  Miriam Butt The Light Verb Jungle , 2003 .

[129]  Soo-Min Kim,et al.  Determining the Sentiment of Opinions , 2004, COLING.

[130]  Roberto Basili,et al.  A "not-so-shallow" parser for collocational analysis , 1994, COLING.

[131]  Ulrich Heid,et al.  Extraction of German Multiword Expressions from Parsed Corpora Using Context Features , 2010, LREC.

[132]  Igorʹ A. Melʹčuk,et al.  DEC dictionnaire explicatif et combinatoire du français contemporain , 1984 .

[133]  R. Berwick,et al.  Get out but don’t fall down: verb-particle constructions in child language , 2012 .

[134]  Iñaki Alegria,et al.  Automatic Extraction of NV Expressions in Basque: Basic Issues on Cooccurrence Techniques , 2011, MWE@ACL.

[135]  Carlos Ramisch,et al.  An Evaluation of Methods for the Extraction of Multiword Expressions , 2008, LREC 2008.

[136]  Haizhou Li,et al.  EM-based Hybrid Model for Bilingual Terminology Extraction from Comparable Corpora , 2010, COLING.

[137]  Mikel L. Forcada Apertium: traducció automàtica de codi obert per a les llengües romàniques , 2009, Linguamática.

[138]  Nicoletta Calzolari,et al.  Acquisition of Lexical Information from a Large Textual Italian Corpus , 1990, COLING.

[139]  Sara Stymne A Comparison of Merging Strategies for Translation of German Compounds , 2009, EACL.

[140]  Olatz Ansa,et al.  Representation and Treatment of Multiword Expressions in Basque , 2004 .

[141]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[142]  Aline Villavicencio,et al.  Statistically-Driven Alignment-Based Multiword Expression Identification for Technical Domains , 2009, MWE@IJCNLP.

[143]  Fintan J. Costello,et al.  Learning to Interpret Novel Noun-Noun Compounds: Evidence from Category Learning Experiments , 2007, Cognitive Aspects of Computational Language Acquisition.

[144]  Timothy Baldwin,et al.  A Resource for Evaluating the Deep Lexical Acquisition of English Verb-Particle Constructions , 2008, LREC 2008.

[145]  Reinhard Rapp,et al.  The Computation of Associative Responses to Multiword Stimuli , 2008, COLING 2008.

[146]  María Teresa Cabré La terminologia: la teoria, els mètodes, les aplicacions , 1992 .

[147]  Malvina Nissim,et al.  Creation of Lexical Resources for a Characterisation of Multiword Expressions in Italian , 2010, LREC.

[148]  Ray Jackendoff TWISTIN' THE NIGHT AWAY , 1997 .

[149]  Timothy Baldwin,et al.  Interpretation of Compound Nominalisations using Corpus and Web Statistics , 2006 .

[150]  Franz Josef Och Statistical Machine Translation: Foundations and Recent Advances , 2005, MTSUMMIT.

[151]  Sue Atkins The DANTE Database: Its Contribution to English Lexical Research, and in Particular to Complementing the FrameNet Data , 2010, A Way with Words.

[152]  J. Fleiss Measuring nominal scale agreement among many raters. , 1971 .

[153]  Afsaneh Fazly,et al.  Automatically Constructing a Lexicon of Verb Phrase Idiomatic Combinations , 2006, EACL.

[154]  Graça Rio-Torto,et al.  O Léxico : semântica e gramática das unidades lexicais , 2006 .

[155]  Mário J. Silva,et al.  The Design of OPTIMISM, an Opinion Mining System for Portuguese Politics , 2009 .

[156]  Eric Wehrli Translating Idioms , 1998, COLING-ACL.

[157]  Udo Hahn,et al.  You Can't Beat Frequency (Unless You Use Linguistic Knowledge) - A Qualitative Evaluation of Association Measures for Collocation and Term Extraction , 2006, ACL.

[158]  Robert L. Mercer,et al.  The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[159]  Jin-Dong Kim,et al.  The GENIA corpus: an annotated research abstract corpus in molecular biology domain , 2002 .

[160]  Luís Sarmento,et al.  Liars and Saviors in a Sentiment Annotated Corpus of Comments to Political Debates , 2011, ACL.

[161]  Adam Kilgarriff Googleology is Bad Science , 2007, Computational Linguistics.

[162]  Carlos Ramisch,et al.  A Broad Evaluation of Techniques for Automatic Acquisition of Multiword Expressions , 2012, ACL 2012.

[163]  Felice Dell'Orletta,et al.  Contrastive Filtering of Domain-Specific Multi-Word Terms from Different Types of Corpora , 2010, MWE@COLING.

[164]  Hermann Ney,et al.  Word Reordering and a Dynamic Programming Beam Search Algorithm for Statistical Machine Translation , 2003, CL.

[165]  Frank Keller,et al.  Using the Web to Obtain Frequencies for Unseen Bigrams , 2003, CL.

[166]  O. Jespersen A modern English grammar on historical principles , 1928 .

[167]  Mark A. Finlayson,et al.  jMWE: A Java Toolkit for Detecting Multi-Word Expressions , 2011, MWE@ACL.

[168]  Hermann Ney,et al.  The Alignment Template Approach to Statistical Machine Translation , 2004, CL.

[169]  Osamu Furuse,et al.  Multi-level Similar Segment Matching Algorithm for Translation Memories and Example-Based Machine Translation , 2000, COLING.

[170]  Francis Bond,et al.  Extracting Transfer Rules for Multiword Expressions from Parallel Corpora , 2011, MWE@ACL.

[171]  Josef van Genabith,et al.  Decreasing Lexical Data Sparsity in Statistical Syntactic Parsing - Experiments with Named Entities , 2011, MWE@ACL.

[172]  Eugene W. Myers,et al.  Suffix arrays: a new method for on-line string searches , 1993, SODA '90.

[173]  Vincent Vandeghinste,et al.  An Efficient, Generic Approach to Extracting Multi-Word Expressions from Dependency Trees , 2010, MWE@COLING.

[174]  José Gabriel Pereira Lopes,et al.  Towards Automatic Building of Document Keywords , 2010, COLING.

[175]  Satoshi Sato,et al.  Standardizing Complex Functional Expressions in Japanese Predicates: Applying Theoretically-Based Paraphrasing Rules , 2010, MWE@COLING.

[176]  John Sinclair,et al.  Collins COBUILD dictionary of phrasal verbs , 1991 .

[177]  Carlos Ramisch,et al.  Picking them up and Figuring them out: Verb-Particle Constructions, Noise and Idiomaticity , 2008, CoNLL.

[178]  Afsaneh Fazly,et al.  Pulling their Weight: Exploiting Syntactic Forms for the Automatic Identification of Idiomatic Expressions in Context , 2007 .

[179]  Suzanne Stevenson,et al.  Classifying Particle Semantics in English Verb-Particle Constructions , 2006 .

[180]  Mark Steedman,et al.  Last Words: On Becoming a Discipline , 2008, CL.

[181]  Rai Mahesh Sinha Stepwise Mining of Multi-Word Expressions in Hindi , 2011, MWE@ACL.

[182]  I. Good THE POPULATION FREQUENCIES OF SPECIES AND THE ESTIMATION OF POPULATION PARAMETERS , 1953 .

[183]  Carlos Ramisch,et al.  mwetoolkit: a Framework for Multiword Expression Identification , 2010, LREC.

[184]  Eric Laporte,et al.  An Electronic Dictionary of French Multiword Adverbs , 2008, LREC 2008.

[185]  Carlos Ramisch Une plate-forme générique et ouverte pour le traitement des expressions polylexicales (An Open and Generic Framework for the Acquisition of Multiword Expressions) [in French] , 2012, JEP-TALN-RECITAL.

[186]  Ted Dunning,et al.  Accurate Methods for the Statistics of Surprise and Coincidence , 1993, CL.

[187]  Thierry Poibeau,et al.  LexSchem: a Large Subcategorization Lexicon for French Verbs , 2008, LREC.

[188]  Alain Polguère,et al.  A Formal Lexicon in the Meaning-Text Theory (or How to Do Lexica with Words) , 1987, Comput. Linguistics.

[189]  Carlos Ramisch,et al.  Identificação de Expressões Multipalavra em Domínios Específicos , 2010, Linguamática.

[190]  Frank Smadja,et al.  Retrieving Collocations from Text: Xtract , 1993, CL.

[191]  Christopher R. Johnson,et al.  Lexicographic Relevance: Selecting Information From Corpus Evidence , 2003 .

[192]  Paul Rayson,et al.  Automatic Extraction of Chinese Multiword Expressions with a Statistical Tool , 2006 .

[193]  Filip Gralinski,et al.  Computational Lexicography of Multi-Word Units. How Efficient Can It Be? , 2010, MWE@COLING.

[194]  Aggeliki Fotopoulou,et al.  Une classification des phrases à compléments figés en grec moderne. Etude morphosyntaxique des phrases figées: thèse de doctorat soutenue à l'Université Paris VIII le 26 février 1993 : rèsumé de l'auteur , 1993 .

[195]  Pavel Pecina,et al.  Lexical association measures and collocation extraction , 2009, Lang. Resour. Evaluation.

[196]  Terry Joyce,et al.  Comparing Lexical Relationships Observed within Japanese Collocation Data and Japanese Word Association Norms , 2008, COLING 2008.

[197]  Archna Bhatia,et al.  PropBank Annotation of Multilingual Light Verb Constructions , 2010, Linguistic Annotation Workshop.

[198]  Mansuk Song,et al.  Retrieving Collocations From Korean Text , 1999, EMNLP.

[199]  Stefan Langer,et al.  A linguistic test battery for support verb constructions , 2004 .

[200]  Preslav Nakov,et al.  Improved Statistical Machine Translation Using Monolingual Paraphrases , 2008, ECAI.

[201]  Timothy Baldwin,et al.  Noun-Noun Compound Machine Translation A Feasibility Study on Shallow Processing , 2003, Proceedings of the ACL 2003 workshop on Multiword expressions analysis, acquisition and treatment -.

[202]  Afsaneh Fazly,et al.  Automatically learning semantic knowledge about multiword predicates , 2007, Lang. Resour. Evaluation.

[203]  Stefania Spina,et al.  The Dictionary of Italian Collocations: Design and Integration in an Online Learning Environment , 2010, LREC.

[204]  Ioannis Korkontzelos,et al.  Can Recognising Multiword Expressions Improve Shallow Parsing? , 2010, HLT-NAACL.

[205]  Suzanne Stevenson,et al.  Distinguishing Subtypes of Multiword Expressions Using Linguistically-Motivated Statistical Measures , 2007 .

[206]  Philipp Koehn,et al.  Europarl: A Parallel Corpus for Statistical Machine Translation , 2005, MTSUMMIT.

[207]  Hiroaki Saito,et al.  A Hybrid Approach for Functional Expression Identification in a Japanese Reading Assistant , 2010, MWE@COLING.

[208]  Veronika Vincze,et al.  Detecting Noun Compounds and Light Verb Constructions: a Contrastive Study , 2011, MWE@ACL.

[209]  J. Firth,et al.  Papers in linguistics, 1934-1951 , 1957 .

[210]  Dipankar Das,et al.  Automatic Extraction of Complex Predicates in Bengali , 2010, MWE@COLING.

[211]  Aline Villavicencio,et al.  Automated Multiword Expression Prediction for Grammar Engineering , 2006 .

[212]  Eric Wehrli,et al.  Multilingual Collocation Extraction: Issues and Solutions , 2006 .

[213]  Stefan Evert,et al.  The Statistics of Word Cooccur-rences: Word Pairs and Collocations , 2004 .

[214]  Timothy Baldwin,et al.  Multiword Expressions: A Pain in the Neck for NLP , 2002, CICLing.

[215]  Violeta Seretan,et al.  A Tool for Multi-Word Expression Extraction in Modern Greek Using Syntactic Parsing , 2009, EACL.

[216]  Mathieu Mangeot,et al.  Dictionary Building with the Jibiki Platform: the GDEF case , 2006, LREC.

[217]  Scott Martens,et al.  Varro: An Algorithm and Toolkit for Regular Structure Discovery in Treebanks , 2010, COLING.

[218]  Kenneth Ward Church,et al.  How Many Multiword Expressions do People Know? , 2011, TSLP.

[219]  Mark A. Finlayson,et al.  Detecting Multi-Word Expressions Improves Word Sense Disambiguation , 2011, MWE@ACL.

[220]  Eric Wehrli,et al.  FipsCoView: On-line Visualisation of Collocations Extracted from Multilingual Parallel Corpora , 2011, MWE@ACL.

[221]  Stefan Langer,et al.  A Formal Specification of Support Verb Constructions , 2009 .

[222]  D. Bolinger The phrasal verb in English , 1974 .

[223]  Ying Liu,et al.  The Ngram Statistics Package (Text: : NSP) : A Flexible Tool for Identifying Ngrams, Collocations, and Word Associations , 2011, MWE@ACL.

[224]  Afsaneh Fazly,et al.  Unsupervised Type and Token Identification of Idiomatic Expressions , 2009, CL.

[225]  Aline Villavicencio,et al.  Introduction to the special issue on multiword expressions: Having a crack at a hard nut , 2005, Comput. Speech Lang..

[226]  Carlos Ramisch,et al.  A Hybrid Approach for Multiword Expression Identification , 2010, PROPOR.

[227]  Béatrice Daille,et al.  Conceptual Structuring through Term Variations , 2003, ACL 2003.

[228]  Aline Villavicencio,et al.  Identification and Treatment of Multiword Expressions Applied to Information Retrieval , 2011, MWE@ACL.

[229]  Sayori Shimohata,et al.  Retrieving Collocations by Co-Occurrences and Word Order Constraints , 1997, ACL.

[230]  Kenneth Ward Church,et al.  Using Suffix Arrays to Compute Term Frequency and Document Frequency for All Substrings in a Corpus , 2001, Computational Linguistics.

[231]  Kenneth Ward Church,et al.  Termight: Identifying and Translating Technical Terminology , 1994, ANLP.

[232]  Preslav Nakov,et al.  Classification of semantic relations between nominals , 2009, Lang. Resour. Evaluation.

[233]  Fabio Rinaldi,et al.  A symbolic approach to automatic multiword term structuring , 2005, Comput. Speech Lang..

[234]  Timothy Baldwin,et al.  Extracting the Unextractable: A Case Study on Verb-particles , 2002, CoNLL.

[235]  Carlos Ramisch,et al.  Web-based and combined language models: a case study on noun compound identification , 2010, COLING.

[236]  Alain Polguère,et al.  Introduction à la lexicologie explicative et combinatoire , 1995 .

[237]  Hermann Ney,et al.  Improved Statistical Alignment Models , 2000, ACL.

[238]  Philip Koehn,et al.  Statistical Machine Translation , 2010, EAMT.

[239]  Carlos Ramisch,et al.  Multiword Expressions in the wild? The mwetoolkit comes in handy , 2010, COLING.

[240]  Aravind K. Joshi,et al.  Tree-Rewriting Models of Multi-Word Expressions , 2011, MWE@ACL.

[241]  Stefan Evert,et al.  Multiword expressions: hard going or plain sailing? , 2010, Lang. Resour. Evaluation.

[242]  Sivaji Bandyopadhyay,et al.  Identification of Reduplication in Bengali Corpus and their Semantic Analysis: A Rule Based Approach , 2010, MWE@COLING.

[243]  Gregory Grefenstette,et al.  The World Wide Web as a Resource for Example-Based Machine Translation Tasks , 1999, TC.

[244]  Min-Yen Kan,et al.  A re-examination of lexical association measures , 2009, MWE@IJCNLP.

[245]  Darren Pearce A Comparative Evaluation of Collocation Extraction Techniques , 2002, LREC.

[246]  Philipp Koehn,et al.  What’s New in Statistical Machine Translation , 2003, NAACL.

[247]  David Yarowsky,et al.  One Sense per Collocation , 1993, HLT.

[248]  Gaël Dias,et al.  Using Masks, Suffix Array-based Data Structures and Multidimensional Arrays to Compute Positional Ngram Statistics from Corpora , 2003, ACL 2003.

[249]  Yvette Yannick Mathieu Annotation of Emotions and Feelings in Texts , 2005, ACII.