Automated Taxonomy Generation for Summarizing Multi-Type Relational Datasets

Taxonomy construction provides an efficient navigating and browsing mechanism to people by organizing large amounts of information into a small number of hierarchical clusters. Compared with manually editing taxonomies, Automated Taxonomy Generation has numerous advantages and has therefore been applied to categorize document collections. However, the utility of this technique to organize and represent relational datasets has not been investigated, because of its unaffordable computational complexity. In this paper we propose a new ATG method based on the relational clustering framework DIVA. By incorporating the idea of Representative Objects, the computational complexity can be greatly reduced. Moreover, we analyze the divergence of the data attributes and label the taxonomic nodes accordingly. The quality of the derived taxonomy is quantitatively evaluated by a synthesized criterion that considers both the intra-node homogeneity and inter-node heterogeneity. Theoretical analysis and experimental results prove that our approach is comparably effective and more efficient than other ATG algorithms.

[1]  Raghu Krishnapuram,et al.  Automatic Taxonomy Generation: Issues and Possibilities , 2003, IFSA.

[2]  Shourya Roy,et al.  A hierarchical monothetic document clustering algorithm for summarization and browsing search results , 2004, WWW '04.

[3]  R. Suganya,et al.  Data Mining Concepts and Techniques , 2010 .

[4]  Vipul Kashyap,et al.  TaxaMiner: an experimentation framework for automated taxonomy bootstrapping , 2005, Int. J. Web Grid Serv..

[5]  Philip S. Yu,et al.  LinkClus: efficient clustering via heterogeneous semantic links , 2006, VLDB.

[6]  Arnold L. Rosenberg,et al.  Finding topic words for hierarchical summarization , 2001, SIGIR '01.

[7]  W. Bruce Croft,et al.  Generating hierarchical summaries for web searches , 2003, SIGIR '03.

[8]  Tao Li,et al.  Diva: a variance-based clustering approach for multi-type relational data , 2007, CIKM '07.

[9]  Shui-Lung Chuang,et al.  Towards automatic generation of query taxonomy: a hierarchical query clustering approach , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[10]  Tian Zhang,et al.  BIRCH: an efficient data clustering method for very large databases , 1996, SIGMOD '96.

[11]  Daniel Boley,et al.  Hierarchical Taxonomies using Divisive Partitioning , 1998 .

[12]  Douglas H. Fisher,et al.  Knowledge Acquisition Via Incremental Conceptual Clustering , 1987, Machine Learning.

[13]  Mark A. Gluck,et al.  Information, Uncertainty and the Utility of Categories , 1985 .

[14]  Pádraig Cunningham,et al.  Ontology Discovery for the Semantic Web Using Hierarchical Clustering , 2002 .

[15]  Steffen Staab,et al.  Comparing conceptual, parti-tional and agglomerative clustering for learning taxonomies from text , 2004 .

[16]  Jochen Dörre,et al.  The TaxGen Framework: Automating the Generation of a Taxonomy for a Large Document Collection , 1999, HICSS.

[17]  George Karypis,et al.  A Comparison of Document Clustering Techniques , 2000 .

[18]  George Karypis,et al.  Evaluation of hierarchical clustering algorithms for document datasets , 2002, CIKM '02.

[19]  David G. Stork,et al.  Pattern classification, 2nd Edition , 2000 .

[20]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[21]  Pu-Jen Cheng,et al.  Auto-generation of topic hierarchies for web images from users' perspectives , 2003, CIKM '03.