A fusion approach to cluster labeling

We present a novel approach to the cluster labeling task using fusion methods. The core idea of our approach is to weigh labels, suggested by any labeler, according to the estimated labeler's decisiveness with respect to each of its suggested labels. We hypothesize that, a cluster labeler's labeling choice for a given cluster should remain stable even in the presence of a slightly incomplete cluster data. Using state-of-the-art cluster labeling and data fusion methods, evaluated over a large data collection of clusters, we demonstrate that, overall, the cluster labeling fusion methods that further consider the labeler's decisiveness provide the best labeling performance.

[1]  David R. Karger,et al.  Scatter/Gather: a cluster-based approach to browsing large document collections , 1992, SIGIR '92.

[2]  Charu C. Aggarwal,et al.  A Survey of Text Clustering Algorithms , 2012, Mining Text Data.

[3]  Ludmila I. Kuncheva,et al.  A stability index for feature selection , 2007, Artificial Intelligence and Applications.

[4]  Rich Caruana,et al.  Consensus Clusterings , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[5]  Shengli Wu,et al.  Data Fusion in Information Retrieval , 2012, Adaptation, Learning, and Optimization.

[6]  David M. Pennock,et al.  Inferring hierarchical descriptions , 2002, CIKM '02.

[7]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[8]  Jackie Chi Kit Cheung,et al.  Sequence clustering and labeling for unsupervised query intent discovery , 2012, WSDM '12.

[9]  David Carmel,et al.  Enhancing cluster labeling using wikipedia , 2009, SIGIR.

[10]  Elad Yom-Tov,et al.  What makes a query difficult? , 2006, SIGIR.

[11]  Ryoji Kataoka,et al.  A clustering method for news articles retrieval system , 2005, WWW '05.

[12]  James P. Callan,et al.  Automatically labeling hierarchical clusters , 2006, DG.O.

[13]  Derek Greene,et al.  Unsupervised graph-based topic labelling using dbpedia , 2013, WSDM.