Building regression cost models for multidatabase systems

A major challenge for performing global query optimization in a multidatabase system (MDBS) is the lack of cost models for local database systems at the global level. The authors present a statistical procedure based on multiple regression analysis for building cost models for local database systems in an MDBS. Explanatory variables that can be included in a regression model are identified and a mixed forward and backward method for selecting significant explanatory variables is presented. Measures for developing useful regression cost models, such as removing outliers, eliminating multicollinearity, validating regression model assumptions, and checking significance of regression models, are discussed. Experimental results demonstrate that the presented statistical procedure can develop useful local cost models in an MDBS.

[1]  Per-Åke Larson,et al.  A query sampling method for estimating local cost parameters in a multidatabase system , 1994, Proceedings of 1994 IEEE 10th International Conference on Data Engineering.

[2]  Beng Chin Ooi,et al.  On global multidatabase query optimization , 1992, SGMD.

[3]  Donald D. Chamberlin,et al.  Access Path Selection in a Relational Database Management System , 1989 .

[4]  F NaughtonJeffrey,et al.  Practical selectivity estimation through adaptive sampling , 1990 .

[5]  O. L. Davies,et al.  Statistical Methods. 6th Edition. , 1968 .

[6]  Per-Åke Larson,et al.  Query optimization using fuzzy set theory for multidatabase systems , 1993, CASCON.

[7]  S. Chatterjee,et al.  Regression Analysis by Example , 1979 .

[8]  Gregory Piatetsky-Shapiro,et al.  Accurate estimation of the number of tuples satisfying a condition , 1984, SIGMOD '84.

[9]  John Neter,et al.  Applied Linear Statistical Models , 1974 .

[10]  Per-Åke Larson,et al.  A Fuzzy Query Optimization Approach for Multidatabase Systems , 1997, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[11]  Doron Rotem,et al.  Simple Random Sampling from Relational Databases , 1986, VLDB.

[12]  Michael H. Kutner Applied Linear Statistical Models , 1974 .

[13]  Arjun K. Gupta The Theory of Linear Models and Multivariate Analysis , 1981 .

[14]  S. Chatterjee,et al.  Regression Analysis by Example (2nd ed.). , 1992 .

[15]  V. Barnett,et al.  Applied Linear Statistical Models , 1975 .

[16]  Jeffrey F. Naughton,et al.  Practical selectivity estimation through adaptive sampling , 1990, SIGMOD '90.

[17]  Qiang Zhu Query optimization in multidatabase systems , 1992, CASCON.

[18]  Patricia G. Selinger,et al.  Access path selection in a relational database management system , 1979, SIGMOD '79.

[19]  Matthias Jarke,et al.  Query Optimization in Database Systems , 1984, CSUR.

[20]  Per-Åke Larson,et al.  Establishing a fuzzy cost model for query optimization in a multidatabase system , 1994, 1994 Proceedings of the Twenty-Seventh Hawaii International Conference on System Sciences.

[21]  G. W. Snedecor Statistical Methods , 1964 .

[22]  Weimin Du,et al.  Query Optimization in a Heterogeneous DBMS , 1992, VLDB.