Discovering and organizing noun-verb collocations in specialized corpora using inductive logic programming

This article presents a method for discovering and organizing noun-verb (N-V) combinations found in a French corpus on computing. Our aim is to find N-V combinations in which verbs convey a "realization meaning" as defined in the framework of lexical functions (Mel?cuk 1996, 1998). Our approach, chiefly corpus-based, uses a machine learning technique, namely Inductive Logic Programming (ILP). The whole acquisition process is divided into three steps: (1) isolating contexts in which specific N-V pairs occur; (2) inferring linguistically-motivated rules that reflect the behaviour of realization N-V pairs; (3) projecting these rules on corpora to find other valid N-V pairs. This technique is evaluated in terms of the relevance of the rules inferred and in terms of the quality (recall and precision) of the results. Results obtained show that our approach is able to find these very specific semantic relationships (the realization N-V pairs) with very good success rates.

[1]  Igor Mel’čuk,et al.  Dictionnaire explicatif et combinatoire du français contemporain. Recherches lexico-sémantiques IV: Recherches lexico-sémantiques IV , 1999 .

[2]  Vincent Claveau,et al.  Learning Semantic Lexicons from a Part-of-Speech and Semantically Tagged Corpus Using Inductive Logic Programming , 2003, J. Mach. Learn. Res..

[3]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[4]  Adam Kilgarriff,et al.  WORD SKETCH: Extraction and Display of Signicant Collocations for Lexicography , 2000 .

[5]  Darren Pearce A Comparative Evaluation of Collocation Extraction Techniques , 2002, LREC.

[6]  F. Hausmann,et al.  Un dictionnaire des collocations est-il possible? , 1979 .

[7]  Marie-Claude L'Homme,et al.  Two methods for extracting "specific" single-word terms from specialized corpora Experimentation and evaluation , 2005 .

[8]  Igorʹ A. Melʹčuk,et al.  DEC dictionnaire explicatif et combinatoire du français contemporain , 1984 .

[9]  Ted Dunning,et al.  Accurate Methods for the Statistics of Surprise and Coincidence , 1993, CL.

[10]  Frank Smadja,et al.  Retrieving Collocations from Text: Xtract , 1993, CL.

[11]  Igor Mel’čuk,et al.  Lexical functions: a tool for the description of lexical relations in a lexicon , 1996 .

[12]  James Pustejovsky,et al.  The Generative Lexicon , 1995, CL.

[13]  Thierry Fontenelle,et al.  Turning a bilingual dictionary into a lexical semantic database , 1997 .

[14]  Alain Polguère Collocations et fonctions lexicales : pour un modèle d'apprentissage , 2003 .

[15]  Vincent Claveau,et al.  Acquisition of Qualia Elements from Corpora - Evaluation of a Symbolic Learning Method , 2002, LREC.

[16]  SmadjaFrank Retrieving collocations from text , 1993 .

[17]  Luc De Raedt,et al.  Inductive Logic Programming: Theory and Methods , 1994, J. Log. Program..

[18]  Gregory Grefenstette,et al.  Explorations in automatic thesaurus discovery , 1994 .