Making sense of collocations

Lexico-semantic collocations (LSCs) are a prominent type of multiword expressions. Over the last decade, the automatic compilation of LSCs from text corpora has been addressed in a significant number of works. However, very often, the output of an LSC-extraction program is a plain list of LSCs. Being useful as raw material for dictionary construction, plain lists of LSCs are of a rather limited use in NLP-applications. For NLP, LSCs must be assigned syntactic and, especially, semantic information. Our goal is to develop an ‘‘off-the-shelf’’ LSC-acquisition program that annotates each LSC identified in the corpus with its syntax and semantics. In this article, we address the annotation task as a classification task,viewing it as a machine learning problem. The LSC-typology we use are the lexical functions from the Explanatory Combinatorial Lexicology; as lexico-semantic resource, EuroWordnet has been used. The applied machine learning technique is a variant of the nearest neighbor-family, which is defined over lexico-semantic features of the elements of LSCs. The technique has been tested on Spanish verb–noun bigrams. � 2005 Elsevier Ltd. All rights reserved.

[1]  Alain Polguère Lexical Function Standardness , 2007 .

[2]  P. Resnik Selection and information: a class-based approach to lexical relationships , 1993 .

[3]  D. J. Allerton,et al.  Three (or four) levels of word cooccurence restriction , 1984 .

[4]  J. Firth,et al.  Papers in linguistics, 1934-1951 , 1957 .

[5]  Hang Li,et al.  Generalizing Case Frames Using a Thesaurus and the MDL Principle , 1995, CL.

[6]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[7]  Alain Polguère,et al.  Introduction à la lexicologie explicative et combinatoire , 1995 .

[8]  Frank Smadja,et al.  Retrieving Collocations from Text: Xtract , 1993, CL.

[9]  Eric V. Siegel Corpus-Based Linguistic Indicators for Aspectual Classification , 1999, ACL.

[10]  R. Schreuder,et al.  Idioms : structural and psychological perspectives , 1997 .

[11]  Timothy Baldwin,et al.  Multiword Expressions: A Pain in the Neck for NLP , 2002, CICLing.

[12]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[13]  Magnus Merkel,et al.  Knowledge-lite extraction of multi-word units with language filters and entropy thresholds , 2000, RIAO.

[14]  Uri Zernik,et al.  Lexical acquisition: Exploiting on-line resources to build a lexicon. , 1991 .

[15]  Leo Wanner,et al.  The first steps towards the automatic compilation of specialized collocation dictionaries , 2005 .

[16]  Darren Pearce,et al.  Synonymy in collocation extraction , 2001 .

[17]  Piek Vossen,et al.  EuroWordNet: A multilingual database with lexical semantic networks , 1998, Springer Netherlands.

[18]  L. Dekang,et al.  Extracting collocations from text corpora , 1998 .

[19]  Christiane Fellbaum,et al.  Nouns in WordNet , 1998 .

[20]  Victor Sadler,et al.  Book Reviews: Lexical Acquisition: Exploiting On-Line Resources to Build a Lexicon , 1993, CL.

[21]  Mark Dras,et al.  Automatic Identification of Support Verbs: A Step Towards a Definition of Semantic Weight , 1995, ArXiv.

[22]  Margarita Alonso Ramos Elaboración del Diccionario de colocaciones del español y sus aplicaciones , 2004 .

[23]  Leo Wanner Towards automatic fine-grained semantic classification of verb-noun collocations , 2004, Nat. Lang. Eng..

[24]  M. Benson The Structure of the Collocational Dictionary , 1989 .

[25]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[26]  Kathleen R. McKeown,et al.  Using collocations for language generation 1 , 1991 .

[27]  Igor Mel’čuk,et al.  Lexical functions and lexical inheritance for emotion lexemes in German , 1996 .

[28]  Nir Friedman,et al.  Bayesian Network Classifiers , 1997, Machine Learning.

[29]  Timothy Baldwin,et al.  An Empirical Model of Multiword Expression Decomposability , 2003, ACL 2003.

[30]  Béatrice Daille,et al.  Conceptual Structuring through Term Variations , 2003, ACL 2003.

[31]  Barbara Rosario,et al.  Classifying the Semantic Relations in Noun Compounds via a Domain-Specific Lexical Hierarchy , 2001, EMNLP.

[32]  Kenneth Ward Church,et al.  Using Statistics in Lexical Analysis , 2003, Lexical Acquisition: Exploiting On-Line Resources to Build a Lexicon.

[33]  Gerard Salton,et al.  Automatic term class construction using relevance--A summary of work in automatic pseudoclassification , 1980, Inf. Process. Manag..

[34]  Alain Polguère Towards a theoretically-motivated general public dictionary of semantic derivations and collocations for French , 2000 .

[35]  Vincent Claveau,et al.  Discovering Specific Semantic Relationships between Nouns and Verbs in a Specialized French Corpus , 2004 .

[36]  Diana McCarthy Word Sense Disambiguation for Acquisition of Selectional Preferences , 1997 .

[37]  Slava M. Katz,et al.  Technical terminology: some linguistic properties and an algorithm for identification in text , 1995, Natural Language Engineering.

[38]  Francesc Ribas,et al.  On Learning more Appropriate Selectional Restrictions , 1995, EACL.

[39]  Leo Wanner Lexical functions in lexicography and natural language processing , 1996 .

[40]  Stephen Clark,et al.  Class-Based Probability Estimation Using a Semantic Hierarchy , 2002, CL.

[41]  Margarita Alonso Las construcciones con verbo de apoyo , 2004 .

[42]  Antonio Sanfilippo Using Semantic Similarity to Acquire Cooccurrence Restrictions from Corpora , 2002 .

[43]  Kenneth Ward Church,et al.  Word Association Norms, Mutual Information, and Lexicography , 1989, ACL.

[44]  Sylvain Kahane,et al.  Formal foundation of lexical functions , 2001 .

[45]  Simone Teufel,et al.  Corpus-based Method for Automatic Identification of Support Verbs for Nominalizations , 1995, EACL.

[46]  Diana McCarthy,et al.  Using Semantic Preferences to Identify Verbal Participation in Role Switching Alternations , 2000, ANLP.

[47]  C. Fellbaum An Electronic Lexical Database , 1998 .

[48]  Suzanne Stevenson,et al.  Automatic Verb Classification Based on Statistical Distributions of Argument Structure , 2001, CL.

[49]  Robert Asher,et al.  The Encyclopedia of Language and Linguistics , 1995 .

[50]  Igor Mel’čuk,et al.  Lexical functions: a tool for the description of lexical relations in a lexicon , 1996 .

[51]  Jussi Piitulainen,et al.  Idiomatic Object Usage and Support Verbs , 1998, COLING-ACL.

[52]  Suzanne Stevenson,et al.  Statistical Measures of the Semi-Productivity of Light Verb Constructions , 2004 .

[53]  R. Moon Fixed Expressions and Idioms in English: A Corpus-Based Approach , 1998 .