Efficient Scheduling of Resources for Parallel Query Processing on Grid-based Architecture

Parallel query processing is an increasingly attractive option for improving the performance of database systems. It can also be important in grids since grid technologies have enabled sophisticated interaction and data sharing between resources that may belong to different departments or organizations. The decreasing cost of computing makes parallel query processing economically viable to reduce the response time of decision support queries by using parallel execution to exploit inexpensive resources. In this paper, the architecture of resource scheduling and site selection for parallel queries which are to be executed on the grid is proposed. The main aims are to address the problems of choosing appropriate resources and matching sub plans with these resources. For this reason, decisions have to be made on allocating the available processors among a number of competing database operations running in parallel. For scheduling intra-query parallelism, the new approaches, resource-balancing based site selection algorithm which determines where to execute the upcoming operation sequence based on above architecture, and the resource assignment policy for handling the idle nodes' efficiency well, are proposed. This paper also presents the architecture of scheduling inter-query parallelism and proposed a hierarchical model in order to fully exploit the available parallelism and interactions among the different queries

[1]  Laura M. Haas,et al.  Optimizing Queries Across Diverse Data Sources , 1997, VLDB.

[2]  Sumit Ganguly,et al.  Query optimization for parallel execution , 1992, SIGMOD '92.

[3]  Peter M. G. Apers,et al.  Parallel Evaluation of Multi-join Queries , 1996, ACPC.

[4]  T. Kurc,et al.  Efficient Execution of Multiple Query Workloads in Data Analysis Applications , 2001, ACM/IEEE SC 2001 Conference (SC'01).

[5]  Hongjun Lu,et al.  On Resource Scheduling of Multi-Join Queries in Parallel Database Systems , 1993, Inf. Process. Lett..

[6]  Chaitanya K. Baru,et al.  Query scheduling and site selection algorithms for a cube-connected multicomputer system , 1988, [1988] Proceedings. The 8th International Conference on Distributed.

[7]  Donald Kossmann,et al.  The state of the art in distributed query processing , 2000, CSUR.

[8]  Joel H. Saltz,et al.  Active Proxy-G: Optimizing the Query Execution Process in the Grid , 2002, ACM/IEEE SC 2002 Conference (SC'02).

[9]  Rajeev Motwani,et al.  Scheduling problems in parallel query optimization , 1995, PODS '95.

[10]  John Turek,et al.  Multiresource Malleable Task Scheduling , 1994 .

[11]  Joel H. Saltz,et al.  Optimizing the execution of multiple data analysis queries on parallel and distributed environments , 2004, IEEE Transactions on Parallel and Distributed Systems.

[12]  A. Jhingran,et al.  Join query optimization in parallel database systems , 1993, Proceedings 1993 IEEE Workshop on Advances in Parallel and Distributed Systems.

[13]  Yi Jiang,et al.  Site Allocation for Parallel Query Execution in Locally Distributed Databases , 1995, Parallel and Distributed Computing and Systems.

[14]  David J. DeWitt,et al.  Parallel database systems: the future of high performance database systems , 1992, CACM.

[15]  Prithviraj Banerjee,et al.  A scheduling algorithm for parallelizable dependent tasks , 1991, [1991] Proceedings. The Fifth International Parallel Processing Symposium.