Utility Based Query Dissemination in Spatial Data Grid

Spatial Information Grid is an ideal infrastructure to handle the data-intensive and computing-intensive geo-spatial processing. In order that each agency could ad hoc connect to this computing environment and make autonomous decision, we build a Geospatial Data Grid in peer-to-peer way. The query processor module in each peer can decompose the user's query into sub-queries that executed in different nodes. One problem in the parallel spatial join query optimization is how to determine an appropriate node group to disseminate the sub-queries. Especially, if there is more than one node sharing the same area of interest, there is a dilemma: on the one hand, the task scheduler tends to decompose this query into sub-queries and disseminate them to as many as possible nodes so that they could process the user's query in parallel; on the other hand, recruiting too many nodes will also bring in overhead in repetitive computing, redundant data transmission, and the result merging. Based on the study of trade-off between increasing parallelism and reducing redundancy using the Utility Theory in economics, we put forward a fast node selection algorithm for the parallel spatial join query dissemination. The test in our system shows this strategy could balance the above two conflict demands and is appropriate for use in Data Grid.

[1]  Marta Beltrán,et al.  Resource Dissemination Policies on Grids , 2004, OTM Workshops.

[2]  J Gifford,et al.  Scientific Words in the Century Dictionary. , 1893, Science.

[3]  Rajkumar Buyya,et al.  Pricing for Utility-Driven Resource Management and Allocation in Clusters , 2007, Int. J. High Perform. Comput. Appl..

[4]  이충호,et al.  Efficient Parallel Spatial Join Processing Method in a Shared-Nothing Database Cluster System , 2003 .

[5]  Masaru Kitsuregawa,et al.  Parallel R-tree spatial join for a shared-nothing architecture , 1999, Proceedings 1999 International Symposium on Database Applications in Non-Traditional Environments (DANTE'99) (Cat. No.PR00496).

[6]  Ian Clarke,et al.  Freenet: A Distributed Anonymous Information Storage and Retrieval System , 2000, Workshop on Design Issues in Anonymity and Unobservability.

[7]  R. Govindan,et al.  Utility-based sensor selection , 2006, 2006 5th International Conference on Information Processing in Sensor Networks.

[8]  Kavitha Ranganathan,et al.  Identifying Dynamic Replication Strategies for a High-Performance Data Grid , 2001, GRID.

[9]  Bin Chen,et al.  A global-view-oriented approach to directory management in distributed spatial database , 2006, Geoinformatics.

[10]  Chunlin Li,et al.  Utility driven dynamic resource allocation using competitive markets in computational grid , 2005, Adv. Eng. Softw..

[11]  Hans-Peter Kriegel,et al.  Parallel processing of spatial joins using R-trees , 1996, Proceedings of the Twelfth International Conference on Data Engineering.

[12]  Rajkumar Buyya,et al.  Economic-based Distributed Resource Management and Scheduling for Grid Computing , 2002, ArXiv.

[13]  Jon B. Weissman,et al.  Scheduling parallel computations in a heterogeneous environment , 1996 .

[14]  Ronald L. Rivest,et al.  Introduction to Algorithms, Second Edition , 2001 .

[15]  Minglu Li,et al.  A New Method for Online Scheduling in Computational Grid Environments , 2005, APWeb.

[16]  Bettina Kemme,et al.  Database replication for clusters of workstations , 2000 .

[17]  Kunihiko Kaneko,et al.  The parallel processing of spatial selection for very large geo-spatial databases , 2001, Proceedings. Eighth International Conference on Parallel and Distributed Systems. ICPADS 2001.

[18]  Alexander A. Shvartsman,et al.  Efficient Parallelism vs Reliable Distribution: A Trade-off for Concurrent Computations , 1994, CONCUR.

[19]  David J. DeWitt,et al.  Partition based spatial-merge join , 1996, SIGMOD '96.

[20]  Li Liu,et al.  Utility-based On-demand Heuristic Strategy to Grid Computing , 2005, Joint International Conference on Autonomic and Autonomous Systems and International Conference on Networking and Services - (icas-isns'05).

[21]  S. K. Shrivastava,et al.  Fault-Tolerant Execution of Computationally and Storage Intensive Parallel Programs Over a Network of Workstations: A Case Study , 1995 .

[22]  Ben Y. Zhao,et al.  OceanStore: an architecture for global-scale persistent storage , 2000, SIGP.