Accelerating range queries for large-scale unstructured meshes

Scientific datasets are steadily growing in size, due to increasing resolution and scale. Unstructured meshes are essential to certain fields of engineering and science, but they present special challenges for efficient access and processing. The work described in this paper accelerates range queries for very large unstructured meshes using the GPU. Prior work in the area introduced a preprocessing phase that partitions large unstructured meshes in order to improve locality in storage and memory. Here, we apply the computational power and bandwidth of GPUs to the partitioning problem, significantly reducing preprocessing time. In order to keep the GPU busy, we have to overcome the poor locality of the original unstructured mesh. Toward this end, we developed our own approach to unstructured mesh I/O, called Direct Load. We show that Direct Load significantly outperforms a typical LRU cache. Our ultimate goal is to accelerate range queries. Our preprocessing steps allow us to parallelize range query processing with relatively simple GPU code. Experimental results show that our implementation outperforms the serial implementations by 4x for preprocessing and over 100 χ for range queries.

[1]  Tomas Akenine-Möller Fast 3D Triangle-Box Overlap Testing , 2001, J. Graphics, GPU, & Game Tools.

[2]  Rajeev Thakur,et al.  Passion: Optimized I/O for Parallel Applications , 1996, Computer.

[3]  Xiaotong Zhuang,et al.  Reducing Cache Pollution via Dynamic Data Prefetch Filtering , 2007, IEEE Transactions on Computers.

[4]  Yun Tian,et al.  Partial replica selection for spatial datasets , 2012, 2012 IEEE 8th International Conference on E-Science.

[5]  Philip J. Rhodes,et al.  Towards an efficient storage and retrieval mechanism for large unstructured grids , 2015, Future Gener. Comput. Syst..

[6]  Yun Tian,et al.  A location service for partial spatial replicas implementing an R-tree in a relational database , 2016, J. Parallel Distributed Comput..

[7]  Doron Rotem,et al.  Optimal chunking of large multidimensional arrays for data warehousing , 2007, DOLAP '07.

[8]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[9]  T.E. Tezduyar The Army High Performance Computing Research Center , 1994, IEEE Computational Science and Engineering.

[10]  Yu Cheng,et al.  A Survey on Array Storage, Query Languages, and Systems , 2013, ArXiv.

[11]  Rajeev Thakur,et al.  Data sieving and collective I/O in ROMIO , 1998, Proceedings. Frontiers '99. Seventh Symposium on the Frontiers of Massively Parallel Computation.

[12]  Courtenay T. Vaughan,et al.  Zoltan data management services for parallel dynamic applications , 2002, Comput. Sci. Eng..

[13]  Guide to Partitioning Unstructured Meshes for Parallel Computing , 2010 .

[14]  R. Daniel Bergeron,et al.  Granite: a scientific database model and implementation , 2004 .

[15]  Arie Shoshani,et al.  Efficient organization and access of multi-dimensional datasets on tertiary storage systems , 1995, Inf. Syst..

[16]  I. Song,et al.  Modeling and Querying Scientific Simulation Mesh Data , 1999 .

[17]  Michael Stonebraker,et al.  Efficient organization of large multidimensional arrays , 1994, Proceedings of 1994 IEEE 10th International Conference on Data Engineering.

[18]  Philip J. Rhodes,et al.  Multilevel partitioning of large unstructured grids , 2014, 2014 IEEE International Conference on Big Data (Big Data).

[19]  Yan Zhou,et al.  Hilbert Curve Based Spatial Data Declustering Method for Parallel Spatial Database , 2012, 2012 2nd International Conference on Remote Sensing, Environment and Transportation Engineering.

[20]  Doron Rotem,et al.  Chunking of Large Multidimensional Arrays , 2007 .

[21]  Philip J. Rhodes,et al.  Iteration aware prefetching for unstructured grids , 2013, 2013 IEEE International Conference on Big Data.

[22]  Yun Tian,et al.  A Fast Location Service for Partial Spatial Replicas , 2011, 2011 IEEE/ACM 12th International Conference on Grid Computing.

[23]  Timothy G. Mattson,et al.  OpenCL Programming Guide , 2011 .

[24]  Vipin Kumar,et al.  Graph partitioning for high-performance scientific simulations , 2003 .

[25]  Antonio Polo Márquez,et al.  Multi-dimensional Declustering Methods for Parallel Database Systems , 1996, Euro-Par.

[26]  K. Kavi Cache Memories Cache Memories in Uniprocessors. Reading versus Writing. Improving Performance , 2022 .