High performance computing with spatial datasets
暂无分享,去创建一个
The importance of geo-spatial data is growing with the increasing availability of large geo-spatial datasets such as maps, remote-sensing images, and the decennial census. Applications include geo-spatial intelligence, real-time situation assessment (e.g. during disaster response); high-fidelity terrain visualization (e.g. Google Earth, flight simulators); location-based services; predicting clustering or spread of disease; finding crime hot spots; mission to planet earth (global change and climatology, land-use classification); etc. Many of these applications often impose stringent performance and response time constraints which cannot often be met by today's sequential Geographic Information Systems (GIS) due to the large volume of geo-spatial datasets and the complexity of geo-spatial data-items including imagery, and extended objects (e.g. polygons and line-strings).
High performance computing, e.g. parallelization of GIS, may meet the requirements of some of these applications. In this talk, we illustrate this message in context of two case studies. First, we focus on real-time terrain visualization in context of flight simulators, whose workload can be modeled as range queries on geo-spatial data-sets. Our work with the GIS-range-query operation shows that data-partitioning is an effective approach towards achieving high performance in GIS. As partitioning extended spatial objects is difficult, special techniques such as systematic declustering beyond random partitioning are needed. Experiments also show that the replication of data may be needed to facilitate dynamic load balancing, as the cost of local processing is often less than the cost of data transfer for spatial objects. Second, we describe our recent effort to parallelize spatial data mining algorithms. In particular, we present preliminary results in parallelizing algorithms for multi-scale multi-granular classification and for estimating parameters for spatial auto-regression model, which generalizes the linear regression model to address the lack of independence among nearby spatial data-points.