Clustering Concept Hierarchies from Text

We present a novel approach to learning taxonomies or concept hierarchies from text. The approach is based on Formal Concept Analysis, a method mainly used for the analysis of data, i.e. for investigating and processing explicitly given information. Our approach is based on the distributional hypothesis, i.e. that nouns or terms are similar to the extent to which they share contexts. Further, we assume that verbs pose more or less strong selectional restrictions on their arguments. The concept hierarchy is built via Formal Concept Analysis using syntactic dependencies as attributes. The approach is evaluated by comparing the produced concept hierarchies against two handcrafted taxonomies from two different domains: tourism and finance. We compare the results of our approach against a hierarchical bottom-up clustering algorithm as well as against Bi-Section-Kmeans as an instance of a top-down clustering algorithm.

[1]  Steffen Staab,et al.  Automatic Acquisition of Taxonomies from Text: FCA meets NLP , 2003 .

[2]  Bernhard Ganter,et al.  Formal Concept Analysis: Mathematical Foundations , 1998 .

[3]  David Faure,et al.  A corpus-based conceptual clustering method for verb frames and ontology , 1998 .

[4]  Uta Priss Linguistic Applications of Formal Concept Analysis , 2005, Formal Concept Analysis.

[5]  H. Edelsbrunner,et al.  Efficient algorithms for agglomerative hierarchical clustering methods , 1984 .

[6]  Beth Levin,et al.  English Verb Classes and Alternations: A Preliminary Investigation , 1993 .

[7]  Donald Hindle,et al.  Noun Classification From Predicate-Argument Structures , 1990, ACL.

[8]  Andreas Hotho,et al.  Conceptual Knowledge Processing with Formal Concept Analysis and Ontologies , 2004, ICFCA.

[9]  Steffen Staab,et al.  Measuring Similarity between Ontologies , 2002, EKAW.

[10]  Philip Resnik,et al.  Selectional Preference and Sense Disambiguation , 1997 .

[11]  George Karypis,et al.  A Comparison of Document Clustering Techniques , 2000 .

[12]  Naftali Tishby,et al.  Distributional Clustering of English Words , 1993, ACL.

[13]  C. Sporleder A galois lattice based approach to lexical inheritance hierarchy learning , 2002 .

[14]  Marti A. Hearst Automatic Acquisition of Hyponyms from Large Text Corpora , 1992, COLING.

[15]  Steffen Staab,et al.  GETESS - Searching the Web Exploiting German Texts , 1999, CIA.

[16]  Gilles Bisson,et al.  Designing Clustering Methods for Ontology Building - The Mo'K Workbench , 2000, ECAI Workshop on Ontology Learning.

[17]  Helmut Schmid,et al.  LoPar: Design and Implementation , 2000 .

[18]  Roberto Basili,et al.  Corpus-Driven Unsupervised Learning of Verb Subcategorization Frames , 1997, AI*IA.

[19]  Sharon A. Caraballo Automatic construction of a hypernym-labeled noun hierarchy from text , 1999, ACL.

[20]  Wiebke Petersen,et al.  A Set-Theoretical Approach for the Induction of Inheritance Hierarchies , 2004, FGMOL.