Study of Algorithms for Clustering Records in Document Databases

Response time of an information system can be improved by reducing the number of buckets accessed when retrieving a document set. One approach is to restructure the document base in such a way that similar documents are placed close together in the file space. This ensures greater probability that identifying records will be collocated within the same bucket. This paper is concerned with examining two algorithms proposed to solve the clustering problem and analyze and predict their thus effected density and retrieval times using Random probability theory. Results suggest that, given an acceptable confidence interval, the prediction of file properties, before and after clustering, when the characteristic parameters of a file are known, is fairly accurate. Keyterms : Information Retrieval, Database systems, Clustering