A Bayesian Approach to Database Query Optimization

The focus of this paper is the application of Bayesian concepts to database query optimization. In relational database systems users retrieve data by describing the desired data. The description of the desired data takes the form of statements which specify operations on “relations” (as defined by set theory). In a relational database system a “query optimizer” determines how the desired data is to be found. The optimizer may have access to several potential algorithms for each step in the query. Each algorithm has a unique effect on the cost of evaluating the query. No single algorithm is best in all cases. To choose among the possible alternatives, the optimizer uses summary statistics of the data stored in the database to estimate the cost associated with the use of each alternative algorithm. Given the summary nature of the statistics typically collected, the true cost of the use of a given algorithm can be only estimated. As such, the query optimizer must choose algorithms in the presence of uncertai...