Using interdocument similarity information in document retrieval systems

The first part of this paper reports a comparative study of the document classifications produced by the use of the single linkage, complete linkage, group average, and Ward clustering methods. Studies of cluster membership and of the effectiveness of cluster searches support previous findings that suggest that the single linkage classifications are rather different from those produced by the other three methods. These latter methods all produce large numbers of small clusters containing just pairs of documents. This finding motivates the work reported in the second part of the paper, which considers the use of clusters consisting of a document together with that document with which it is most similar. A comparison of the use of such clusters with conventional best match searches using seven document test collections suggests that the two types of search are of comparable effectiveness, but they retrieve noticeably different sets of relevant documents. © 1986 John Wiley & Sons, Inc.

[1]  Terry Noreault,et al.  Automatic ranked output from boolean searches in SIRE , 1977, J. Am. Soc. Inf. Sci..

[2]  G. Krishna,et al.  Agglomerative clustering using the concept of mutual nearest neighbourhood , 1978, Pattern Recognit..

[3]  Gerard Salton,et al.  The SMART Retrieval System , 1971 .

[4]  C. J. van Rijsbergen,et al.  The use of hierarchic clustering in information retrieval , 1971, Inf. Storage Retr..

[5]  Fionn Murtagh,et al.  Structure of hierarchic clusterings: implications for information retrieval and for multivariate data analysis , 1984, Inf. Process. Manag..

[6]  Ovad Mansur An associative search strategy for information retrieval , 1980, Inf. Process. Manag..

[7]  Peter Willett,et al.  Hierarchic Agglomerative Clustering Methods for Automatic Document Classification , 1984, J. Documentation.

[8]  J. P. Brzozowski,et al.  MASQUERADE: Searching the full text of abstracts using automatic indexing , 1983 .

[9]  R. Jarvis,et al.  ClusteringUsing a Similarity Measure Based on SharedNear Neighbors , 1973 .

[10]  C. J. van Rijsbergen,et al.  An Evaluation of feedback in Document Retrieval using Co‐Occurrence Data , 1978, J. Documentation.

[11]  W. Bruce Croft,et al.  A network organization used for document retrieval , 1983, SIGIR '83.

[12]  G. N. Lance,et al.  A General Theory of Classificatory Sorting Strategies: 1. Hierarchical Systems , 1967, Comput. J..

[13]  Peter Willett A fast procedure for the calculation of similarity coefficients in automatic classification , 1981, Inf. Process. Manag..

[14]  Riichiro Mizoguchi,et al.  A Nonparametric Algorithm for Detecting Clusters Using Hierarchical Structure , 1980, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Peter Willett,et al.  Document clustering using an inverted file approach , 1980 .

[16]  W. Bruce Croft,et al.  The Use of Adaptive Mechanisms for Selection of Search Strategies in Document Retrieval Systems , 1984, SIGIR.

[17]  Jeffrey Katzer,et al.  A study of the overlap among document representations , 1983, SIGIR '83.

[18]  Lynn Evans,et al.  Search strategy variations in SDI profiles , 1975 .

[19]  Peter Willett A note on the use of nearest neighbors for implementing single linkage document classifications , 1984, J. Am. Soc. Inf. Sci..

[20]  W. Bruce Croft Incorporating different search models into one document retrieval system , 1981, SIGIR 1981.

[21]  W. Bruce Croft,et al.  Using Probabilistic Models of Document Retrieval without Relevance Information , 1979, J. Documentation.

[22]  W. Bruce Croft,et al.  Document clustering: An evaluation of some experiments with the cranfield 1400 collection , 1975, Inf. Process. Manag..

[23]  W. Bruce Croft A model of cluster searching bases on classification , 1980, Inf. Syst..

[24]  William Goffman,et al.  An indirect method of information retrieval , 1968, Inf. Storage Retr..

[25]  Stephen E. Robertson,et al.  Relevance weighting of search terms , 1976, J. Am. Soc. Inf. Sci..

[26]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[27]  K. Sparck Jones,et al.  A TEST FOR THE SEPARATION OF RELEVANT AND NON‐RELEVANT DOCUMENTS IN EXPERIMENTAL RETRIEVAL COLLECTIONS , 1973 .