Optimal information retrieval when queries are not random

Abstract We consider the complexity of the general information retrieval system design problem and multiattribute file systems based upon the multiple key hashing (MKH) design problem. We first show that the problem of designing an optimal multiattribute file system is NP-hard. The performance formula for multiattribute file systems based upon the MKH method is derived. We also show that the design problem for a multiattribute file system based upon the MKH method is related to the prime number problem. We show that the problem of designing optimal multiattribute files based upon the MKH method can be reduced to finding minimal N -tuples, which was discussed by Chang, Lee and Du. We further present a very efficient method for designing good multiple key hashing functions in the case where the number of buckets is a power of a prime number. We also propose a heuristic algorithm to design good multiple key hashing functions in general.

[1]  Chin-Chen Chang,et al.  Some properties of Cartesian product files , 1980, SIGMOD '80.

[2]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[3]  Ronald L. Rivest,et al.  Partial-Match Retrieval Algorithms , 1976, SIAM J. Comput..

[4]  Sakti P. Ghosh Data base organization for data management , 1977 .

[5]  Vaughan R. Pratt,et al.  Every Prime has a Succinct Certificate , 1975, SIAM J. Comput..

[6]  Gary L. Miller Riemann's Hypothesis and Tests for Primality , 1976, J. Comput. Syst. Sci..

[7]  S. Bing Yao,et al.  Multi-dimensional clustering for data base organizations , 1977, Inf. Syst..

[8]  Alfred V. Aho,et al.  Optimal partial-match retrieval when fields are independently specified , 1979, ACM Trans. Database Syst..

[9]  Richard C. T. Lee,et al.  Common Properties of Some Multiattribute File Systems , 1979, IEEE Transactions on Software Engineering.

[10]  Matti Jakobsson,et al.  Reducing block accesses in inverted files by partial clustering , 1980, Inf. Syst..

[11]  Chin-Chen Chang,et al.  The hierarchical ordering in multiattribute files , 1983, Inf. Sci..

[12]  Chuan Yi Tang,et al.  On the complexity of some multi-attribute file design problems , 1985, Inf. Syst..

[13]  Chin-Chen Chang,et al.  Performance Analyses of Cartesian Product Files and Random Files , 1984, IEEE Transactions on Software Engineering.

[14]  Jon Louis Bentley,et al.  Data Structures for Range Searching , 1979, CSUR.

[15]  James B. Rothnie,et al.  Attribute based file organization in a paged memory environment , 1974, CACM.

[16]  Azad Bolour Optimality Properties of Multiple-Key Hashing Functions , 1979, JACM.

[17]  Chin-Chen Chang,et al.  Symbolic Gray Code as a Perfect Multiattribute Hashing Scheme for Partial Match Queries , 1982, IEEE Transactions on Software Engineering.