Apprentissage en corpus de couples nom-verbe pour la construction d'un lexique génératif

NLP systems involving disambiguation and rephrasing require a fine-grained description of the semantics of lexical units. In this paper we describe a means for automatically extracting such information from corpora, in the framework of Pustejovsky’s Generative Lexicon. In one of the components of this lexical model, called the qualia structure, words are described in terms of semantic roles. The qualia structure of a noun is mainly made up of verbal associations, encoding relational information. For example, the French verb mesurer refers to the telic role of the noun jaugeur. Our aim is, for a given noun (N), to be able to automatically extract from a corpus the verbs (V) that could belong to its qualia structure. More precisely, in this paper, we describe a method based on learning techniques within the Inductive Logic Programming framework, that permits us to distinguish in the corpus between N-V pairs that are linked by a semantic relation and pairs that are not. Results compared with a Khi2 score demonstrate that the method is very promising, not only because an important proportion of relevant pairs are detected, but also because it provides information that can be used to build linguistic rules.