The difficulty of optimum index selection

Given a file on a secondary store in which each record has several attributes, it is usually advantageous to build an index mechanism to decrease the cost of conducting transactions to the file. The problem of selecting attributes over which to index has been studied in the context of various storage structures and access assumptions. One algorithm to make an optimum index selection requires 2k steps in the worst case, where k is the number of attributes in the file. We examine the question of whether a more efficient algorithm might exist and show that even under a simple cost criterion the problem is computationally difficult in a precise sense. Our results extend directly to other related problems where the cost of the index depends on fixed values which are assigned to each attribute. Some practical implications are discussed.