Manycore GPU processing of repeated range queries over streams of moving objects observations

The ability to timely process significant amounts of continuously updated spatial data is mandatory for an increasing number of applications. Parallelism enables such applications to face this data‐intensive challenge and allows the devised systems to feature low latency and high scalability. In this paper, we focus on a specific data‐intensive problem concerning the repeated processing of huge amounts of range queries over massive sets of moving objects, where the spatial extent of queries and objects is continuously modified over time. To tackle this problem and significantly accelerate query processing, we devise a hybrid CPU/GPU pipeline that compresses data output and saves query processing work. The devised system relies on an ad‐hoc spatial index leading to a problem decomposition that results in a set of independent data‐parallel tasks. The index is based on a point‐region quadtree space decomposition and allows to tackle effectively a broad range of spatial object distributions, even those very skewed. Also, to deal with the architectural peculiarities and limitations of the GPUs, we adopt non‐trivial GPU data structures that avoid the need of locked memory accesses while favouring coalesced memory accesses, thus enhancing the overall memory throughput. To the best of our knowledge, this is the first work that exploits GPUs to efficiently solve repeated range queries over massive sets of continuously moving objects, possibly characterized by highly skewed spatial distributions. In comparison with state‐of‐the‐art CPU‐based implementations, our method highlights significant speedups in the order of 10 − 20×, depending on the dataset. Copyright © 2016 John Wiley & Sons, Ltd.

[1]  Johannes Gehrke,et al.  An Experimental Analysis of Iterated Spatial Joins in Main Memory , 2013, Proc. VLDB Endow..

[2]  Pradeep Dubey,et al.  Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU , 2010, ISCA.

[3]  A. Grimshaw,et al.  High Performance and Scalable Radix Sorting: a Case Study of Implementing Dynamic Parallelism for GPU Computing , 2011, Parallel Process. Lett..

[4]  Divyakant Agrawal,et al.  Hardware acceleration for spatial selections and joins , 2003, SIGMOD '03.

[5]  Dongseop Kwon,et al.  Parallel Range Query Processing on R-Tree with Graphics Processing Unit , 2011, 2011 IEEE Ninth International Conference on Dependable, Autonomic and Secure Computing.

[6]  Michael Garland,et al.  Designing efficient sorting algorithms for manycore GPUs , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[7]  Nikos Pelekis,et al.  Mobility Data Management and Exploration , 2014, Springer New York.

[8]  C. Mohan,et al.  Concurrency and recovery in generalized search trees , 1997, SIGMOD '97.

[9]  Sariel Har-Peled Geometric Approximation Algorithms , 2011 .

[10]  Martin D. F. Wong,et al.  Parallel implementation of R-trees on the GPU , 2012, 17th Asia and South Pacific Design Automation Conference.

[11]  Salvatore Orlando,et al.  Processing streams of spatial k-NN queries and position updates on manycore GPUs , 2015, SIGSPATIAL/GIS.

[12]  Matt Pharr,et al.  Gpu gems 2: programming techniques for high-performance graphics and general-purpose computation , 2005 .

[13]  Hyesoon Kim,et al.  An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness , 2009, ISCA '09.

[14]  Christian S. Jensen,et al.  Parallel main-memory indexing for moving-object query and update workloads , 2012, SIGMOD Conference.

[15]  Christian S. Jensen,et al.  GPU-Based Computing of Repeated Range Queries over Moving Objects , 2014, 2014 22nd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing.

[16]  David A. Patterson,et al.  Computer Architecture - A Quantitative Approach (4. ed.) , 2007 .

[17]  Naohito Nakasato,et al.  Implementation of a parallel tree method on a GPU , 2011, J. Comput. Sci..

[18]  Hans-Peter Kriegel,et al.  Efficient processing of spatial joins using R-trees , 1993, SIGMOD Conference.

[19]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[20]  Jianting Zhang,et al.  Speeding up large-scale point-in-polygon test based spatial join on GPUs , 2012, BigSpatial '12.

[21]  Hans-Peter Seidel,et al.  Stackless KD‐Tree Traversal for High Performance GPU Ray Tracing , 2007, Comput. Graph. Forum.

[22]  Margaret Martonosi,et al.  Characterizing and improving the use of demand-fetched caches in GPUs , 2012, ICS '12.

[23]  Christian S. Jensen,et al.  Thread-Level Parallel Indexing of Update Intensive Moving-Object Workloads , 2011, SSTD.

[24]  Dinesh Manocha,et al.  Fast BVH Construction on GPUs , 2009, Comput. Graph. Forum.

[25]  Fabian Gieseke,et al.  Buffer k-d Trees: Processing Massive Nearest Neighbor Queries on GPUs , 2014, ICML.

[26]  Yao Zhang,et al.  Scan primitives for GPU computing , 2007, GH '07.

[27]  Andreas Reuter,et al.  Transaction Processing: Concepts and Techniques , 1992 .

[28]  Tero Karras,et al.  Maximizing parallelism in the construction of BVHs, octrees, and k-d trees , 2012, EGGH-HPG'12.

[29]  Philippas Tsigas,et al.  On dynamic load balancing on graphics processors , 2008, GH '08.

[30]  Jens Dittrich,et al.  MOVIES: indexing moving objects by shooting index images , 2011, GeoInformatica.

[31]  Rajeev Raman,et al.  Converting to and from Dilated Integers , 2008, IEEE Transactions on Computers.

[32]  Thomas Brinkhoff,et al.  A Framework for Generating Network-Based Moving Objects , 2002, GeoInformatica.

[33]  Andreas Nüchter,et al.  GPU-Accelerated Nearest Neighbor Search for 3D Registration , 2009, ICVS.