Learning Concept Hierarchies from Text with a Guided Hierarchical Clustering Algorithm

We present an approach for the automatic induction of concept hierarchies from text collections. We propose a novel guided agglomerative hierarchical clustering algorithm exploiting a hypernym oracle to drive the clustering process. By inherently integrating the hypernym oracle into the clustering algorithm, we overcome two main problems of unsupervised clustering approaches relying on the distributional similarity of terms to induce concept hierarchies. First, by only clustering two terms if they have a hypernym in common we make sure that the cluster produced in this way is actually reasonable. Second, by labeling the clusters with the corresponding hypernym we overcome the labeling problem shared by all unsupervised approaches. We present results of a comparison of our approach with Caraballo’s method, assessing the quality of the automatically learned ontologies by comparing them to a handcrafted taxonomy for the tourism domain using the similarity measures of Maedche et al. Further, we also present a human evaluation of the concept hierarchy produced by our guided algorithm.

[1]  Steffen Staab,et al.  Comparing Conceptual, Divise and Agglomerative Clustering for Learning Taxonomies from Text , 2004, ECAI.

[2]  Walter Daelemans,et al.  Unsupervised Text Mining for Ontology Extraction: An Evaluation of Statistical Measures , 2004, LREC.

[3]  Zellig S. Harris,et al.  Mathematical structures of language , 1968, Interscience tracts in pure and applied mathematics.

[4]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[5]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[6]  Marta Sabou,et al.  Learning web service ontologies: an automatic extraction method and its evaluation , 2005 .

[7]  Steffen Staab,et al.  Measuring Similarity between Ontologies , 2002, EKAW.

[8]  Marti A. Hearst Automatic Acquisition of Hyponyms from Large Text Corpora , 1992, COLING.

[9]  Naftali Tishby,et al.  Distributional Clustering of English Words , 1993, ACL.

[10]  Steffen Staab,et al.  The Karlsruhe view on ontologies , 2003 .

[11]  Donald Hindle,et al.  Noun Classification From Predicate-Argument Structures , 1990, ACL.

[12]  David Faure,et al.  A corpus-based conceptual clustering method for verb frames and ontology , 1998 .

[13]  Gilles Bisson,et al.  Designing Clustering Methods for Ontology Building - The Mo'K Workbench , 2000, ECAI Workshop on Ontology Learning.

[14]  Helmut Schmidt,et al.  Probabilistic part-of-speech tagging using decision trees , 1994 .

[15]  David G. Stork,et al.  Pattern Classification , 1973 .

[16]  Steffen Staab,et al.  Learning Taxonomic Relations from Heterogeneous Evidence , 2004 .

[17]  Paola Velardi,et al.  Using text processing techniques to automatically enrich a domain ontology , 2001, FOIS.

[18]  Sharon A. Caraballo Automatic construction of a hypernym-labeled noun hierarchy from text , 1999, ACL.