Heuristic approach for early separated filter and refinement strategy in spatial query optimization

Recently, we proposed an optimization strategy for spatial and non-spatial mixed queries. In the strategy, the filter step and the refinement step of a spatial operator are regarded as individual algebraic operators, and are early separated at the algebraic level by the query optimizer. By doing so, the optimizer using the strategy could generate more diverse and efficient plans than the traditional optimizer. We called this optimization strategy the Early Separated Filter And Refinement (ESFAR).In this paper, we improved the cost model of the ESFAR optimizer considering the real life environment such as the LRU buffer, the clustering of the dataset, and the selectivity of the real data distribution. And we conducted a new experiment for ESFAR by comparing the optimization result generated by the new cost model and the actual execution result using real data. The experimental result showed that our cost model is accurate and our ESFAR optimizer estimates the costs of execution plans well.Since the ESFAR strategy has more operators and more rules than the traditional one, it consumes more optimization time. In this paper, we apply two existing heuristic algorithms, the iterative improvement (II) and the simulated annealing (SA), to the ESFAR optimizer. Additionally we propose a new heuristic algorithm to find a good initial state of II and SA. Through experiments, we show that the II and SA algorithms in the ESFAR strategy find a good sub-optimal plan in reasonable time. Mostly the heuristic algorithms find a lower cost plan in less time than the optimal plan generated by the traditional optimizer Especially the II algorithm with the initial state heuristic rapidly finds a plan of a high quality.

[1]  Yong-Ju Lee,et al.  Analysis of two-step index structure for complex spatial objects , 2000, Inf. Sci..

[2]  Donald D. Chamberlin,et al.  Access Path Selection in a Relational Database Management System , 1989 .

[3]  Yannis E. Ioannidis,et al.  Randomized algorithms for optimizing large join queries , 1990, SIGMOD '90.

[4]  Patrick Valduriez,et al.  Join indices , 1987, TODS.

[5]  Nick Koudas,et al.  Size separation spatial join , 1997, SIGMOD '97.

[6]  Beng Chin Ooi,et al.  Extending a DBMS for geographic applications , 1989, [1989] Proceedings. Fifth International Conference on Data Engineering.

[7]  Timos K. Sellis,et al.  A model for the prediction of R-tree performance , 1996, PODS.

[8]  Jeffrey D. Ullman,et al.  Principles Of Database And Knowledge-Base Systems , 1979 .

[9]  Goetz Graefe,et al.  The Volcano optimizer generator: extensibility and efficient search , 1993, Proceedings of IEEE 9th International Conference on Data Engineering.

[10]  Gregory Piatetsky-Shapiro,et al.  Accurate estimation of the number of tuples satisfying a condition , 1984, SIGMOD '84.

[11]  David J. DeWitt,et al.  OPT++ : an object-oriented implementation for extensible database query optimization , 1999, The VLDB Journal.

[12]  Jeffrey D. Uuman Principles of database and knowledge- base systems , 1989 .

[13]  David J. DeWitt,et al.  Building a scaleable geo-spatial DBMS: technology, implementation, and evaluation , 1997, SIGMOD '97.

[14]  Dimitris Papadias,et al.  Multiway spatial joins , 2001, ACM Trans. Database Syst..

[15]  Timos K. Sellis,et al.  Cost models for join queries in spatial databases , 1998, Proceedings 14th International Conference on Data Engineering.

[16]  Jack A. Orenstein Spatial query processing in an object-oriented database system , 1986, SIGMOD '86.

[17]  Walid G. Aref,et al.  Optimization for Spatial Query Processing , 1991, Very Large Data Bases Conference.

[18]  Jeffrey D. Ullman,et al.  Principles of Database and Knowledge-Base Systems, Volume II , 1988, Principles of computer science series.

[19]  Hans-Peter Kriegel,et al.  The R*-tree: an efficient and robust access method for points and rectangles , 1990, SIGMOD '90.

[20]  M. W. Blasgen,et al.  Storage and Access in Relational Data Bases , 1977, IBM Syst. J..

[21]  Bernd-Uwe Pagel,et al.  Towards an analysis of range query performance in spatial data structures , 1993, PODS '93.

[22]  Hans-Peter Kriegel,et al.  Efficient processing of spatial joins using R-trees , 1993, SIGMOD Conference.

[23]  Mario A. López,et al.  The Effect of Buffering on the Performance of R-Trees , 2000, IEEE Trans. Knowl. Data Eng..

[24]  Sridhar Ramaswamy,et al.  Selectivity estimation in spatial databases , 1999, SIGMOD '99.

[25]  Chin-Wan Chung A query optimization in distributed database systems , 1983 .

[26]  TheodoridisYannis,et al.  Topological relations in the world of minimum bounding rectangles , 1995 .

[27]  S. B. Yao,et al.  Approximating block accesses in database organizations , 1977, CACM.

[28]  Abraham Silberschatz,et al.  Database Systems Concepts , 1997 .

[29]  Elke A. Rundensteiner,et al.  A cost model for estimating the performance of spatial joins using R-trees , 1997, Proceedings. Ninth International Conference on Scientific and Statistical Database Management (Cat. No.97TB100150).

[30]  Abraham Silberschatz,et al.  Database System Concepts , 1980 .

[31]  Ralf Hartmut Güting,et al.  Rule-based optimization and query processing in an extensible geometric database system , 1992, TODS.

[32]  Kyu-Young Whang,et al.  Query Optimization Techniques Utilizing Path Indexes in Object-Oriented Database Systems , 1997, DASFAA.

[33]  Christos Faloutsos,et al.  On packing R-trees , 1993, CIKM '93.

[34]  Yong-Ju Lee,et al.  Spatial Query Optimization Utilizing Early Separated Filter and Refinement Strategy , 2000, Inf. Syst..

[35]  Chan-Gun Lee,et al.  Early separation of filter and refinement steps in spatial query optimization , 1999, Proceedings. 6th International Conference on Advanced Systems for Advanced Applications.

[36]  Walid G. Aref,et al.  Spatial Data Models and Query Processing , 1995, Modern Database Systems.

[37]  C. Mohan,et al.  Single Table Access Using Multiple Indexes: Optimization, Execution, and Concurrency Control Techniques , 1990, EDBT.