Generation and search of clustered files

A classified, or clustered file is one where related, or similar records are grouped into classes, or clusters of items in such a way that all items within a cluster are jointly retrievable. Clustered files are easily adapted to broad and narrow search strategies, and simple file updating methods are available. An inexpensive file clustering method applicable to large files is given together with appropriate file search methods. An abstract model is then introduced to predict the retrieval effectiveness of various search methods in a clustered file environment. Experimental evidence is included to test the versatility of the model and to demonstrate the role of various parameters in the cluster search process.

[1]  Gerard Salton,et al.  Dynamic information and library processing , 1975 .

[2]  Harold Borko,et al.  Automatic Document Classification , 1963, JACM.

[3]  R. Bayer,et al.  Organization and maintenance of large ordered indices , 1970, SIGFIDET '70.

[4]  Donald B. Crouch Cluster Analysis: Bibliography , 1971, SIGIR Forum.

[5]  Peter K. T. Vaswani A technique for cluster emphasis and its application to automatic indexing , 1968, IFIP Congress.

[6]  Barry Litofsky,et al.  Utility of automatic classification systems for information storage and retrieval , 1969 .

[7]  Harold Borko Research in automatic generation of classification systems , 1964, AFIPS '64 (Spring).

[8]  Jack Minker,et al.  An evaluation of query expansion by the addition of clustered terms for a document retrieval system , 1972, Inf. Storage Retr..

[9]  Clement T. Yu,et al.  A theory of term importance in automatic text analysis , 1974, J. Am. Soc. Inf. Sci..

[10]  K. Sparck Jones,et al.  KEYWORDS AND CLUMPS , 1964 .

[11]  David M. Jackson,et al.  The construction of retrieval environments and pseudo-classifications based on external relevance , 1970, Inf. Storage Retr..

[12]  Karen Spärck Jones,et al.  Automatic term classifications and retrieval , 1968, Inf. Storage Retr..

[13]  W. Bruce Croft Clustering large files of documents using the single-link method , 1977, J. Am. Soc. Inf. Sci..

[14]  Roger M. Needham,et al.  Applications of the theory of clumps , 1965, Mech. Transl. Comput. Linguistics.

[15]  Michael R. Anderberg,et al.  Cluster Analysis for Applications , 1973 .

[16]  Frank B. Baker,et al.  Information Retrieval Based upon Latent Class Analysis , 1962, JACM.

[17]  Lauren B. Doyle SOME COMPROMISES BETWEEN WORD GROUPING AND DOCUMENT GROUPING , 1964 .

[18]  James B. Rothnie,et al.  Attribute based file organization in a paged memory environment , 1974, CACM.

[19]  Brian Everitt,et al.  Cluster analysis , 1974 .

[20]  Karen Spärck Jones,et al.  The use of automatically-obtained keyword classifications for information retrieval , 1969, Inf. Storage Retr..

[21]  A. G. Dale,et al.  Some clumping experiments for associative document retrieval , 1965 .

[22]  Calvin C. Gotlieb,et al.  Semantic Clustering of Index Terms , 1968, J. ACM.

[23]  Samuel Schiminovich Automatic classification and retrieval of documents by means of a bibliographic pattern discovery algorithm , 1971, Inf. Storage Retr..

[24]  Gerard Salton,et al.  The SMART Retrieval System—Experiments in Automatic Document Processing , 1971 .

[25]  Samuel Schiminovich,et al.  A clustering experiment: First step towards a computer-generated classification scheme , 1968, Inf. Storage Retr..

[26]  Gerard Salton,et al.  Experiments in Automatic Thesaurus Construction for Information Retrieval , 1971, IFIP Congress.

[27]  Jack Minker,et al.  An Analysis of Some Graph Theoretical Cluster Techniques , 1970, JACM.

[28]  C. J. van Rijsbergen,et al.  The use of hierarchic clustering in information retrieval , 1971, Inf. Storage Retr..

[29]  Clement T. Yu,et al.  Effective Automatic Indexing Using Term Addition and Deletion , 1978, JACM.

[30]  Clement T. Yu,et al.  On the estimation of the number of desired records with respect to a given query , 1978, TODS.

[31]  Donald B. Crouch,et al.  A file organization and maintenance procedure for dynamic document collections , 1975, Inf. Process. Manag..