Retrieval of Relevant Concepts from a Text Collection

This paper addresses the characterization of a large text collection by introducing a method for retrieving sets of relevant WordNet concepts as descriptors of the collection contents. The method combines models for identifying interesting word co-occurrences with an extension of a word sense disambiguation algorithm in order to retrieve the concepts that better fit in with the collection topics. Multi-word nominal concepts that do not explicitly appear in the texts, can be found among the retrieved concepts. We evaluate our proposal using extensions of recall and precision that are also introduced in this paper.