Automatic Acquisition of Formal Concepts from Text

This paper describes an unsupervised method for extracting concepts from Part-Of-Speech annotated corpora. The method consists in building bidimensional clusters of both words and their lexico-syntactic contexts. The method is based on Formal Concept Analysis (FCA). Each generated cluster is defined as a formal concept with a set of words describing the extension of the concept and a set of contexts perceived as the intensional attributes (or properties) valid for all the words in the extension. The clustering process relies on two concept operations: abstraction and specification. The former allows us to build a more generic concept by intersecting the intensions of the merged concepts and making the union of their extensions. By contrast, specification makes the union of the intensions and intersects the extensions. The result is a concept lattice that describes the domain-specific ontology underlying the training corpus.

[1]  Walter Daelemans,et al.  Mining for Lexons: Applying Unsupervised Learning Methods to Create Ontology Bases , 2003, CoopIS/DOA/ODBASE.

[2]  Gregory Grefenstette,et al.  Explorations in automatic thesaurus discovery , 1994 .

[3]  Xavier Carreras,et al.  FreeLing: An Open-Source Suite of Language Analyzers , 2004, LREC.

[4]  Patrick Pantel,et al.  Discovering word senses from text , 2002, KDD.

[5]  Gerd Stumme,et al.  Conceptual knowledge discovery--a human-centered approach , 2003, Appl. Artif. Intell..

[6]  Steffen Staab,et al.  Learning Concept Hierarchies from Text Corpora using Formal Concept Analysis , 2005, J. Artif. Intell. Res..

[7]  Band , 1943 .

[8]  L. R. Rasmussen,et al.  In information retrieval: data structures and algorithms , 1992 .

[9]  Uli Kutter,et al.  Literatur. , 1941, Subjekt.

[10]  Uta Priss,et al.  Formal concept analysis in information science , 2006, Annu. Rev. Inf. Sci. Technol..

[11]  Pablo Gamallo,et al.  Clustering Syntactic Positions with Similar Semantic Requirements , 2005, CL.

[12]  Walter Daelemans,et al.  Is Shallow Parsing Useful for Unsupervised Learning of Semantic Clusters? , 2003, CICLing.

[13]  Patrick Pantel,et al.  An Unsupervised Approach to Prepositional Phrase Attachment using Contextually Similar Words , 2000, ACL.

[14]  Naftali Tishby,et al.  Distributional Clustering of English Words , 1993, ACL.

[15]  Vito Pirrelli,et al.  Example-Based Automatic Induction of Semantic Classes through Entropic Scores. , 1996 .

[16]  Dekang Lin,et al.  Automatic Retrieval and Clustering of Similar Words , 1998, ACL.

[17]  Mats Rooth,et al.  Two-dimensional clusters in grammatical relations , 1995 .

[18]  David Faure Conception de methode d'apprentissage symbolique et automatique pour l'acquisition de cadres de sous-categorisation de verbes et de connaissances semantiques a partir de textes : le systeme asium , 2000 .