An exponential open hashing function based on dynamical systems theory

Hash functions are among the oldest and most widely used data structures in computer science, originating in 1953. This dissertation examines current theory on hash functions and proposes new techniques for analyzing open address hash functions. In particular, the uniform hashing properties are shown to be insufficient to characterize overall performance of hash functions. Instead, it is asserted that the properties of a good hash function are the same as the necessary conditions for chaos. Using dynamical system measures, hash function properties can be measured and improved upon to create better open address hash functions. A new exponential hash function is derived and shown to be better than the current state-of-the-art, double hashing, for many clustered data sets. While it does not always outperform double hashing, exponential hashing suffers none of the performance peaks and valleys that double hashing does. Actual performance measurements are supported by dynamic system measurements in both the integer and real domains. In addition, a new spatial bound for minimal perfect hashing is derived. It is shown that the complexity of a hash function is no greater than the algorithmic complexity of the data set being hashed. As a result, the minimum perfect hash function size can be bounded by the entropy of the data set, since the number of small perfect hash functions is very limited. Further, it is shown that minimal perfect hash functions are not computable for general data sets.