论文信息 - Study of Algorithms for Clustering Records in Document Databases

Study of Algorithms for Clustering Records in Document Databases

Response time of an information system can be improved by reducing the number of buckets accessed when retrieving a document set. One approach is to restructure the document base in such a way that similar documents are placed close together in the file space. This ensures greater probability that identifying records will be collocated within the same bucket. This paper is concerned with examining two algorithms proposed to solve the clustering problem and analyze and predict their thus effected density and retrieval times using Random probability theory. Results suggest that, given an acceptable confidence interval, the prediction of file properties, before and after clustering, when the characteristic parameters of a file are known, is fairly accurate. Keyterms : Information Retrieval, Database systems, Clustering

Issam A. R. Moghrabi | Maisa A. Safar

[1] Per-Åke Larson,et al. Linear Hashing with Partial Expansions , 1980, VLDB.

[2] Jürg Nievergelt,et al. The Grid File: An Adaptable, Symmetric Multikey File Structure , 1984, TODS.

[3] Michael McGill,et al. Introduction to Modern Information Retrieval , 1983 .

[4] Barry G. T. Lowden. An approach to multikey sequencing in an equiprobable keyterm retrieval situation , 1985, SIGIR '85.

[5] Clement T. Yu,et al. On the estimation of the number of desired records with respect to a given query , 1978, TODS.

[6] Clement T. Yu,et al. Adaptive record clustering , 1985, TODS.

[7] Vijay V. Raghavan,et al. On modeling of information retrieval concepts in vector spaces , 1987, TODS.

[8] Antonin Guttman,et al. R-trees: a dynamic index structure for spatial searching , 1984, SIGMOD '84.

[9] Benjamin King. Step-Wise Clustering Procedures , 1967 .

[10] Douglas Comer,et al. Ubiquitous B-Tree , 1979, CSUR.

[11] Anil K. Jain,et al. Algorithms for Clustering Data , 1988 .

[12] Michael L. Mauldin,et al. Conceptual Information Retrieval: A Case Study in Adaptive Partial Parsing , 1991 .

[13] Gerard Salton,et al. Dynamic information and library processing , 1975 .