The Optimal Selection of Secondary Indices for Files

Abstract We consider the problem of finding an optimal set of indices for a file. A general model for a file is assumed together with a probabilistic model of the transactions conducted with it: Queries, Updates, Insertions and Deletions. It is shown that all the information assumed for each attribute can be condensed into two parameters and that properties of the optimal solution can be derived from this condensed information. An algorithm to find the optimal set of indices based on these properties is exhibited.