Automatic Acquisition of GL Resources, Using an Explanatory, Symbolic Technique

This chapter presents a symbolic machine learning method that automatically infers, from descriptions of noun-verb pairs found in a corpus in which the verb plays (or not) one of the qualia roles of the noun, corpus-specific morpho-syntactic and semantic patterns that convey qualia relations. The patterns are explanatory and linguistically motivated, and can be applied to a corpus to efficiently extract GL resources and populate Generative Lexicons. The linguistic relevance of these patterns is examined, and the N-V qualia pairs that they can detect or not is discussed. Comparisons to other methods for corpus-based qualia couple extraction are also presented.

[1]  Ronan Pichon,et al.  Acquisition automatique d'informations lexicales à partir de corpus : un bilan , 1997 .

[2]  Ted Dunning,et al.  Accurate Methods for the Statistics of Surprise and Coincidence , 1993, CL.

[3]  Frank Smadja,et al.  Retrieving Collocations from Text: Xtract , 1993, CL.

[4]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[5]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[6]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[7]  James Pustejovsky,et al.  Semantic Indexing and Typed Hyperlinking , 1997 .

[8]  Robert H. Baud,et al.  Indexing by statistical tagging , 2000 .

[9]  Vincent Claveau,et al.  From efficiency to portability: acquisition of semantic relations by semi-supervised machine learning , 2004, COLING.

[10]  Vincent Claveau,et al.  Extension de requêtes par lien sémantique nom-verbe acquis sur corpus , 2004, JEPTALNRECITAL.

[11]  Kenneth Ward Church,et al.  Word Association Norms, Mutual Information, and Lexicography , 1989, ACL.

[12]  SmadjaFrank Retrieving collocations from text , 1993 .

[13]  R. Oueslati Aide a l'acquisition de connaissances a partir de corpus , 1999 .

[14]  Marti A. Hearst Automatic Acquisition of Hyponyms from Large Text Corpora , 1992, COLING.

[15]  Werner Ceusters,et al.  Syntactic-Semantic Tagging of Medical Texts: The Multi-TALE Project , 1998, Studies in Health Technology and Informatics.

[16]  Luc De Raedt,et al.  Inductive Logic Programming: Theory and Methods , 1994, J. Log. Program..

[17]  Gregory Grefenstette Short Query Linguistic Expansion Techniques: Palliating One-Word Queries by Providing Intermediate Structure to Text , 1997, SCIE.

[18]  James Pustejovsky,et al.  Lexical Semantic Techniques for Corpus Analysis , 1993, CL.

[19]  Cécile Fabre Interpretation automatique des sequences binominales en anglais et en francais. Application a la recherche d'informations , 1996 .

[20]  Vincent Claveau,et al.  Learning Semantic Lexicons from a Part-of-Speech and Semantically Tagged Corpus Using Inductive Logic Programming , 2003, J. Mach. Learn. Res..

[21]  Darren Pearce A Comparative Evaluation of Collocation Extraction Techniques , 2002, LREC.

[22]  James Pustejovsky,et al.  The Generative Lexicon , 1995, CL.

[23]  P. Frank,et al.  Boston Studies in the Philosophy of Science , 1968 .

[24]  Vincent Claveau Acquisition automatique de lexiques sémantiques pour la recherche d'information. (Automatic acquisition of semantic lexicons for information retrieval) , 2003 .

[25]  Pierre Zweigenbaum,et al.  Regroupements issus de dépendances syntaxiques en corpus : catégorisation et confrontation à deux modélisations conceptuelles , 2000 .

[26]  Yorick Wilks,et al.  The Grammar of Sense: Is word-sense tagging much more than part-of-speech tagging? , 1996, ArXiv.

[27]  Zellig S. Harris,et al.  The form of information in science , 1988 .

[28]  Gregory Grefenstette,et al.  Explorations in automatic thesaurus discovery , 1994 .

[29]  Z. Harris,et al.  Book Reviews: The Form of Information in Science: Analysis of an Immunology Sublanguage , 1989, CL.

[30]  David Yarowsky,et al.  Word-Sense Disambiguation Using Statistical Models of Roget’s Categories Trained on Large Corpora , 2010, COLING.

[31]  B. Daille Approche mixte pour l'extraction de terminologie : statistique lexicale et filtres linguistiques , 1994 .

[32]  Mirella Lapata,et al.  A Probabilistic Account of Logical Metonymy , 2003, Computational Linguistics.

[33]  Nancy Ide,et al.  MULTEXT: Multilingual Text Tools and Corpora , 1994, COLING.