Fuzzy Statistics Estimation in Supporting Multidatabase Query Optimization

Advances in networking and database technology have made global information sharing a reality. Multidatabase systems (MDBSs) represent a promising approach to addressing the challenges of achieving interoperability among multiple pre-existing databases that are highly autonomous and possibly heterogeneous. The performance of an MDBS is greatly dependent on effectiveness of multidatabase query optimization (MQO). However, the unavailability of and uncertainty in the statistics essential to query optimization have made multidatabase query optimization (MQO) significantly more challenging than distributed query optimization. This research undertook to develop a fuzzy statistics-based MQO approach to addressing statistics estimation and uncertainty problems in an MDBS environment. We analyzed the statistics needed in an MDBS environment and classified them into three categories: point-based, distribution-function-based and dependency-based. Fuzzy numbers were adopted to represent point-based statistics, and a fuzzy polynomial regression method was developed for estimating distribution function-based statistics (i.e., attribute or join selectivity) from a set of subquery results. For dependency-based statistics, a fuzzy regression method was employed for estimating logical-parameter-based local cost functions. Furthermore, methods for ranking the fuzzy numbers that are fundamental to fuzzy-statistics-based MQO were also discussed. The proposed fuzzy statistics estimation methods were illustrated using examples to demonstrate its applicability in supporting MQO.

[1]  Nick Roussopoulos,et al.  Adaptive selectivity estimation using query feedback , 1994, SIGMOD '94.

[2]  Per-Åke Larson,et al.  A query sampling method for estimating local cost parameters in a multidatabase system , 1994, Proceedings of 1994 IEEE 10th International Conference on Data Engineering.

[3]  Chih-Ping Wei,et al.  Schema management for large-scale multidatabase systems , 1996 .

[4]  Naphtali Rishe,et al.  An instant and accurate size estimation method for joins and selections in a retrieval-intensive environment , 1993, SIGMOD '93.

[5]  Michael Stonebraker,et al.  Letter to Peter Denning (Two VLDB Conferences) , 1982, SIGMOD Rec..

[6]  Masashi Tsuchida,et al.  Local and Global Query Optimization Mechanisms for Relational Databases , 1985, VLDB.

[7]  Amit P. Sheth,et al.  Using Tickets to Enforce the Serializability of Multidatabase Transactions , 1994, IEEE Trans. Knowl. Data Eng..

[8]  Lourdes Campos,et al.  Linear programming problems and ranking of fuzzy numbers , 1989 .

[9]  George J. Klir,et al.  Fuzzy sets and fuzzy logic - theory and applications , 1995 .

[10]  Michael V. Mannino,et al.  Statistical profile estimation in database systems , 1988, CSUR.

[11]  Qiang Zhu,et al.  Developing cost models with qualitative variables for dynamic multidatabase environments , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[12]  Weimin Du,et al.  Query Optimization in a Heterogeneous DBMS , 1992, VLDB.

[13]  Lotfi A. Zadeh,et al.  Fuzzy Sets , 1996, Inf. Control..

[14]  Kuan-Tsae Huang Query optimization in distributed databases , 1982 .

[15]  Clement T. Yu,et al.  Query Processing in Multidatabase Systems , 1995, Modern Database Systems.

[16]  Beng Chin Ooi,et al.  On global multidatabase query optimization , 1992, SGMD.

[17]  J. Adamo Fuzzy decision trees , 1980 .

[18]  Ronald R. Yager,et al.  A procedure for ordering fuzzy subsets of the unit interval , 1981, Inf. Sci..

[19]  Ching-Lai Hwang,et al.  Fuzzy Multiple Attribute Decision Making - Methods and Applications , 1992, Lecture Notes in Economics and Mathematical Systems.

[20]  Per-Åke Larson,et al.  Establishing a fuzzy cost model for query optimization in a multidatabase system , 1994, 1994 Proceedings of the Twenty-Seventh Hawaii International Conference on System Sciences.

[21]  Dean Daniels,et al.  Query Processing in R* , 1985, Query Processing in Database Systems.

[22]  Clement T. Yu,et al.  Distributed query processing , 1984, CSUR.

[23]  Nabil N. Kamel,et al.  Federated database management system: Requirements, issues and solutions , 1992, Comput. Commun..

[24]  Ching-Hsue Cheng,et al.  Fuzzy system reliability analysis for components with different membership functions , 1994 .

[25]  Tadeusz Morzy,et al.  Query optimization in multidatabase systems: solutions and open issues , 1999, Proceedings. Tenth International Workshop on Database and Expert Systems Applications. DEXA 99.

[26]  Rajeev Rastogi,et al.  The concurrency control problem in multidatabases: characteristics and solutions , 1992, SIGMOD '92.

[27]  Zhenyuan Wang,et al.  Fuzzy linear regression analysis of fuzzy valued variables , 1990 .