The effectiveness of query-specific hierarchic clustering in information retrieval

Hierarchic document clustering has been widely applied to information retrieval (IR) on the grounds of its potential improved effectiveness over inverted file search (IFS). However, previous research has been inconclusive as to whether clustering does bring improvements. In this paper we take the view that if hierarchic clustering is applied to search results (query-specific clustering), then it has the potential to increase the retrieval effectiveness compared both to that of static clustering and of conventional IFS. We conducted a number of experiments using five document collections and four hierarchic clustering methods. Our results show that the effectiveness of query-specific clustering is indeed higher, and suggest that there is scope for its application to IR.

[1]  Gerard Salton,et al.  Automatic Information Organization And Retrieval , 1968 .

[2]  Jan O. Pedersen,et al.  Almost-constant-time clustering of arbitrary corpus subsets4 , 1997, SIGIR '97.

[3]  Peter Willett Query-specific automatic document classification , 1985 .

[4]  Robert B. Allen,et al.  An interface for navigating clustered document sets returned by queries , 1993, COCS '93.

[5]  James Allan,et al.  Evaluating a Visual Navigation System for a Digital Library , 1998, ECDL.

[6]  Peter Willett,et al.  Identification of duplicate and near‐duplicate full‐text records in database search‐outputs using hierarchic cluster analysis , 1995 .

[7]  C. J. van Rijsbergen,et al.  Further experiments with hierarchic clustering in document retrieval , 1974, Inf. Storage Retr..

[8]  J. H. Ward Hierarchical Grouping to Optimize an Objective Function , 1963 .

[9]  Marti A. Hearst,et al.  Reexamining the cluster hypothesis: scatter/gather on retrieval results , 1996, SIGIR '96.

[10]  Michael McGill,et al.  A performance evaluation of similarity measures, document term weighting schemes and representations in a Boolean environment , 1980, SIGIR '80.

[11]  David R. Karger,et al.  Scatter/Gather: a cluster-based approach to browsing large document collections , 1992, SIGIR '92.

[12]  Amit Singhal,et al.  Pivoted document length normalization , 1996, SIGIR 1996.

[13]  Gerard Salton,et al.  The SMART Retrieval System—Experiments in Automatic Document Processing , 1971 .

[14]  Peter Willett,et al.  Using interdocument similarity information in document retrieval systems , 1997 .

[15]  Peter Willett,et al.  Comparison of Hierarchie Agglomerative Clustering Methods for Document Retrieval , 1989, Comput. J..

[16]  R. M. Cormack,et al.  A Review of Classification , 1971 .

[17]  Peter Willett,et al.  Recent trends in hierarchic document clustering: A critical review , 1988, Inf. Process. Manag..

[18]  Vincent Kanade,et al.  Clustering Algorithms , 2021, Wireless RF Energy Transfer in the Massive IoT Era.

[19]  Robert F. Ling,et al.  Cluster analysis algorithms for data reduction and classification of objects , 1981 .

[20]  John A. Hartigan,et al.  Clustering Algorithms , 1975 .

[21]  Robert N. Oddy,et al.  Information Retrieval Research , 1982 .

[22]  Robert Burgin,et al.  The Retrieval Effectiveness of Five Clustering Algorithms as a Function of Indexing Exhaustivity , 1995, J. Am. Soc. Inf. Sci..

[23]  C. J. van Rijsbergen,et al.  The use of hierarchic clustering in information retrieval , 1971, Inf. Storage Retr..

[24]  Ricardo Baeza-Yates,et al.  Information Retrieval: Data Structures and Algorithms , 1992 .

[25]  Peter Willett,et al.  Measuring the degree of similarity between objects in text retrieval systems , 1993 .

[26]  Claire Cardie,et al.  Using clustering and SuperConcepts within SMART: TREC 6 , 1997, Inf. Process. Manag..

[27]  Ellen M. Voorhees,et al.  The fifth text REtrieval conference (TREC-5) , 1997 .

[28]  Peter Willett,et al.  Hierarchic Agglomerative Clustering Methods for Automatic Document Classification , 1984, J. Documentation.

[29]  E. Voorhees The Effectiveness & Efficiency of Agglomerative Hierarchic Clustering in Document Retrieval , 1985 .

[30]  W. Bruce Croft,et al.  Document clustering: An evaluation of some experiments with the cranfield 1400 collection , 1975, Inf. Process. Manag..

[31]  W. Bruce Croft A model of cluster searching bases on classification , 1980, Inf. Syst..

[32]  Oren Etzioni,et al.  Web document clustering: a feasibility demonstration , 1998, SIGIR '98.

[33]  William M. Shaw,et al.  Subject and citation indexing, Part II: The optimal, cluster-based retrieval performance of composite representations , 1991, J. Am. Soc. Inf. Sci..

[34]  Fionn Murtagh,et al.  Structure of hierarchic clusterings: implications for information retrieval and for multivariate data analysis , 1984, Inf. Process. Manag..

[35]  W. Bruce Croft,et al.  Organizing and searching large files of document descriptions , 1978 .

[36]  Stephen E. Robertson,et al.  Deciphering cluster representations , 2001, Inf. Process. Manag..

[37]  Iain Campbell,et al.  The ostensive model of developing information needs , 2000 .

[38]  Peter Willett,et al.  Techniques for the measurement of clustering tendency in document retrieval systems , 1987, J. Inf. Sci..

[39]  Mark Sanderson,et al.  Advantages of query biased summaries in information retrieval , 1998, SIGIR '98.