Application of Query Sensitive Similarity Measure in IR systems

Document clustering has been widely used in information retrieval systems in order to improve the efficiency and also the effectiveness of ranked output systems using cluster hypothesis. This hypothesis states that relevant documents tend to be more similar to each other than to non-relevant documents, and therefore tend to appear in the same clusters. So far, the effectiveness of cluster hypothesis experimentally has been examined only for static-clustering and query-specific clustering using cosine similarity measure. On the other hand, the effectiveness of document clustering using Query-Sensitive Similarity Measure (QSSM) has been studied only with N-Nearest Neighbor test for very small and topic-specific document collections. In this paper, the cluster hypothesis for query-specific clustering is investigated using a query-sensitive similarity measure and a large document collection in an experimental environment. The results show that the cluster hypothesis holds for query-specific clustering using employed QSSM. And, the effectiveness of query-specific clustering will increase through the use of that QSSM.

[1]  M. ZOLGHADRI-JAHROMI A PROPOSED QUERY-SENSITIVE SIMILARITY MEASURE FOR INFORMATION RETRIEVAL * , 2006 .

[2]  Robert Villa,et al.  The effectiveness of query-specific hierarchic clustering in information retrieval , 2002, Inf. Process. Manag..

[3]  David R. Karger,et al.  Scatter/Gather: a cluster-based approach to browsing large document collections , 1992, SIGIR '92.

[4]  C. J. van Rijsbergen,et al.  Query-Sensitive Similarity Measures for Information Retrieval , 2003, Knowledge and Information Systems.

[5]  M Zou Alghadri Jahromi,et al.  A PROPOSED QUERY-SENSITIVE SIMILARITY MEASURE FOR INFORMATION RETRIEVAL , 2006 .

[6]  Gerard Salton,et al.  Automatic Information Organization And Retrieval , 1968 .

[7]  Marti A. Hearst,et al.  Reexamining the cluster hypothesis: scatter/gather on retrieval results , 1996, SIGIR '96.

[8]  Gerard Salton,et al.  The SMART Retrieval System—Experiments in Automatic Document Processing , 1971 .

[9]  C. J. van Rijsbergen,et al.  The use of hierarchic clustering in information retrieval , 1971, Inf. Storage Retr..

[10]  Peter Willett,et al.  Recent trends in hierarchic document clustering: A critical review , 1988, Inf. Process. Manag..

[11]  W. Bruce Croft,et al.  Organizing and searching large files of document descriptions , 1978 .

[12]  D. K. Harmon,et al.  Overview of the Third Text Retrieval Conference (TREC-3) , 1996 .

[13]  E. Voorhees The Effectiveness & Efficiency of Agglomerative Hierarchic Clustering in Document Retrieval , 1985 .

[14]  W. Bruce Croft A model of cluster searching bases on classification , 1980, Inf. Syst..

[15]  Peter Willett,et al.  Hierarchic Agglomerative Clustering Methods for Automatic Document Classification , 1984, J. Documentation.

[16]  James Allan,et al.  Automatic Retrieval With Locality Information Using SMART , 1992, TREC.

[17]  Peter Willett Query-specific automatic document classification , 1985 .