Exploring Spatial Indexing for Accelerated Feature Retrieval in HPC

Despite the critical role that range queries play in analysis and visualization for HPC applications, there has been no comprehensive analysis of indices that are designed to accelerate range queries and the extent to which they are viable in an HPC setting. In this state of the practice paper we present the first such evaluation, examining 20 open-source C and C++ libraries that support range queries. Contributions of this paper include answering the following questions: which of the implementations are viable in an HPC setting, how do these libraries compare in terms of build time, query time, memory usage, and scalability, what are other trade-offs between these implementations, is there a single overall best solution, and when does a brute force solution offer the best performance? We also share key insights learned during this process that can assist both HPC application scientists and spatial index developers.

[1]  John Shalf,et al.  Query-driven visualization of large data sets , 2005, VIS 05. IEEE Visualization, 2005..

[2]  Gerd Heber,et al.  Efficient query processing on unstructured tetrahedral meshes , 2006, SIGMOD Conference.

[3]  Guihai Chen,et al.  Towards Parallel Spatial Query Processing for Big Spatial Data , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum.

[4]  Luca Formaggia Data structures for unstructured mesh generation , 1999 .

[5]  Dimitrios Gunopulos,et al.  Spatial queries in sensor networks , 2005, GIS '05.

[6]  Michael Griebel,et al.  Massively Parallel Fluid Simulations on Amazon's HPC Cloud , 2011, 2011 First International Symposium on Network Cloud Computing and Applications.

[7]  Le Gruenwald,et al.  Parallel spatial query processing on GPUs using R-trees , 2013, BigSpatial '13.

[8]  Mariano Vázquez,et al.  A 3D transversally isotropic constitutive model for advanced composites implemented in a high performance computing code , 2018, European Journal of Mechanics - A/Solids.

[9]  Joel H. Saltz,et al.  DataCutter: Middleware for Filtering Very Large Scientific Datasets on Archival Storage Systems , 2000, IEEE Symposium on Mass Storage Systems.

[10]  Bernard Chazelle,et al.  Lower bounds for orthogonal range searching: I. The reporting case , 1990, JACM.

[11]  Kenneth I. Joy,et al.  Variable Interactions in Query-Driven Visualization , 2007, IEEE Transactions on Visualization and Computer Graphics.

[12]  Joel H. Saltz,et al.  Towards building a high performance spatial query system for large scale medical imaging data , 2012, SIGSPATIAL/GIS.

[13]  Cheng Huang,et al.  An Efficient Privacy-Preserving Location-Based Services Query Scheme in Outsourced Cloud , 2016, IEEE Transactions on Vehicular Technology.

[14]  Dan E. Willard,et al.  New Data Structures for Orthogonal Range Queries , 1985, SIAM J. Comput..

[15]  Jon Louis Bentley,et al.  Multidimensional binary search trees used for associative searching , 1975, CACM.

[16]  Thomas Heinis,et al.  Accelerating Range Queries for Brain Simulations , 2012, 2012 IEEE 28th International Conference on Data Engineering.

[17]  Robert Sawko,et al.  HPC-cloud native framework for concurrent simulation, analysis and visualization of CFD workflows , 2021, Future Gener. Comput. Syst..

[18]  K. Schahmaneche,et al.  Improved Photometric Calibration of the SNLS and the SDSS Supernova Surveys , 2012, 1212.4864.

[19]  Mark H. Overmars Efficient Data Structures for Range Searching on a Grid , 1988, J. Algorithms.

[20]  Scott Klasky,et al.  Plasma Edge Kinetic-MHD Modeling in Tokamaks Using Kepler Workflow for Code Coupling, Data Management and Visualization , 2008 .

[21]  Mario A. López,et al.  STR: a simple and efficient algorithm for R-tree packing , 1997, Proceedings 13th International Conference on Data Engineering.

[22]  Hermann A. Maurer,et al.  Efficient worst-case data structures for range searching , 1978, Acta Informatica.

[23]  Mohamed Sarwat,et al.  GeoSpark: a cluster computing framework for processing large-scale spatial data , 2015, SIGSPATIAL/GIS.

[24]  Kenneth I. Joy,et al.  An Application of Multivariate Statistical Analysis for Query-Driven Visualization , 2011, IEEE Transactions on Visualization and Computer Graphics.

[25]  Martin D. F. Wong,et al.  Parallel implementation of R-trees on the GPU , 2012, 17th Asia and South Pacific Design Automation Conference.

[26]  Walid G. Aref,et al.  LocationSpark: A Distributed In-Memory Data Management System for Big Spatial Data , 2016, Proc. VLDB Endow..

[27]  Stefan P. Domino,et al.  Feature Selection, Clustering, and Prototype Placement for Turbulence Data Sets , 2020, AIAA Scitech 2021 Forum.

[28]  Edgar Gabriel,et al.  Incorporating Historic Knowledge into a Communication Library for Self-Optimizing High Performance Computing Applications , 2008, 2008 Second IEEE International Conference on Self-Adaptive and Self-Organizing Systems.

[29]  Murat Demirbas,et al.  Peer-to-peer spatial queries in sensor networks , 2003, Proceedings Third International Conference on Peer-to-Peer Computing (P2P2003).

[30]  MatoušekJiří Geometric range searching , 1994 .

[31]  Simone Scacchi,et al.  A Numerical Study of Scalable Cardiac Electro-Mechanical Solvers on HPC Architectures , 2018, Front. Physiol..

[32]  Yufei Tao,et al.  Query Processing in Spatial Network Databases , 2003, VLDB.

[33]  Christian S. Jensen,et al.  Querying Geo-Textual Data: Spatial Keyword Queries and Beyond , 2016, SIGMOD Conference.

[34]  Prasanna Balaprakash,et al.  Balsam: Automated Scheduling and Execution of Dynamic, Data-Intensive HPC Workflows , 2019, ArXiv.

[35]  Max J. Egenhofer,et al.  Spatial SQL: A Query and Presentation Language , 1994, IEEE Trans. Knowl. Data Eng..

[36]  Wei Guo,et al.  Spherical Region Queries on Multicore Architectures , 2017, IA3@SC.

[37]  Shashi Shekhar,et al.  Spatial Databases - Accomplishments and Research Needs , 1999, IEEE Trans. Knowl. Data Eng..

[39]  Farhan Feroz,et al.  SKYNET: an efficient and robust neural network training tool for machine learning in astronomy , 2013, ArXiv.

[40]  Markus Hadwiger,et al.  ConnectomeExplorer: Query-Guided Visual Analysis of Large Volumetric Neuroscience Data , 2013, IEEE Transactions on Visualization and Computer Graphics.

[41]  Valerio Pascucci,et al.  In-Situ Feature Extraction of Large Scale Combustion Simulations Using Segmented Merge Trees , 2014, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis.

[42]  Bernd-Uwe Pagel,et al.  Towards an analysis of range query performance in spatial data structures , 1993, PODS '93.

[43]  Pierre Alliez,et al.  CGAL - The Computational Geometry Algorithms Library , 2011 .

[44]  Benjamin A. Matthews,et al.  Scalable fully implicit finite element flow solver with application to high-fidelity flow control simulations on a realistic wing design , 2014 .

[45]  Randal C. Burns,et al.  Organization of Data in Non-convex Spatial Domains , 2010, SSDBM.

[46]  Jinwoong Kim,et al.  Parallel multi-dimensional range query processing with R-trees on GPU , 2013, J. Parallel Distributed Comput..

[47]  Ahmed Eldawy,et al.  SpatialHadoop: A MapReduce framework for spatial data , 2015, 2015 IEEE 31st International Conference on Data Engineering.

[48]  Armin B. Cremers,et al.  Efficient radius neighbor search in three-dimensional point clouds , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[49]  Ralf Hartmut Güting Dr.rer.nat An introduction to spatial database systems , 2005, The VLDB Journal.

[50]  Ray W. Grout,et al.  Ultrascale Visualization In Situ Visualization for Large-Scale Combustion Simulations , 2010 .

[51]  Marek Pecha Image segmentation techniques in the HPC environment and their applications. , 2016 .

[52]  Roland Siegwart,et al.  Comparison of nearest-neighbor-search strategies and implementations for efficient shape registration , 2012 .

[53]  Yi-Cheng Tu,et al.  GPU-based parallel indexing for concurrent spatial query processing , 2018, SSDBM.

[54]  Stefan P. Domino,et al.  An assessment of atypical mesh topologies for low-Mach large-eddy simulation , 2019, Computers & Fluids.

[55]  Maarten Löffler,et al.  Range Searching , 2016, Encyclopedia of Algorithms.

[56]  Karsten Schwan,et al.  Six degrees of scientific data: reading patterns for extreme scale science IO , 2011, HPDC '11.

[57]  Xi He,et al.  GPU-based Parallel R-tree Construction and Querying , 2015, 2015 IEEE International Parallel and Distributed Processing Symposium Workshop.

[58]  Steven M. Gallo,et al.  A Workload Analysis of NSF's Innovative HPC Resources Using XDMoD , 2018, ArXiv.

[59]  Jennifer M. Rieser,et al.  Identifying structural flow defects in disordered solids using machine-learning methods. , 2014, Physical review letters.

[60]  Hans Hinterberger,et al.  Spatial data reallocation based on multidimensional range queries. A contribution to data management for the earth sciences , 1994, Seventh International Working Conference on Scientific and Statistical Database Management.

[61]  Nicholas Nethercote,et al.  Valgrind: a framework for heavyweight dynamic binary instrumentation , 2007, PLDI '07.

[62]  Kurt Mehlhorn,et al.  LEDA: a platform for combinatorial and geometric computing , 1997, CACM.

[63]  R. Rajamenakshi,et al.  Segmentation of Large Scale Medical Images using HPC: Classification of Methods and Challenges , 2016 .

[64]  George Kollios,et al.  Complex Spatio-Temporal Pattern Queries , 2005, VLDB.

[65]  Jun Kong,et al.  Scalable 3D spatial queries for analytical pathology imaging with MapReduce , 2016, SIGSPATIAL/GIS.

[66]  David G. Lowe,et al.  Fast Approximate Nearest Neighbors with Automatic Algorithm Configuration , 2009, VISAPP.

[67]  Kwan-Liu Ma,et al.  Intelligent Feature Extraction and Tracking for Visualizing Large-Scale 4D Flow Simulations , 2005, ACM/IEEE SC 2005 Conference (SC'05).

[68]  Ulrich Rüde,et al.  A framework for hybrid parallel flow simulations with a trillion cells in complex geometries , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[69]  Helen D. Karatza,et al.  Programming languages for data-Intensive HPC applications: A systematic mapping study , 2020, Parallel Comput..

[70]  Xianlong Jin,et al.  A Parallel Approach for the Generation of Unstructured Meshes with Billions of Elements on Distributed-Memory Supercomputers , 2016, International Journal of Parallel Programming.

[71]  Sunil Arya,et al.  ANN: library for approximate nearest neighbor searching , 1998 .

[72]  Radu Bogdan Rusu,et al.  3D is here: Point Cloud Library (PCL) , 2011, 2011 IEEE International Conference on Robotics and Automation.

[73]  Antonin Guttman,et al.  R-trees: a dynamic index structure for spatial searching , 1984, SIGMOD '84.

[74]  Prabhat,et al.  Application of Deep Convolutional Neural Networks for Detecting Extreme Weather in Climate Datasets , 2016, ArXiv.

[75]  Jack A. Orenstein Spatial query processing in an object-oriented database system , 1986, SIGMOD '86.

[76]  Michael M. Resch,et al.  Towards performance portability through runtime adaptation for high‐performance computing applications , 2010, Concurr. Comput. Pract. Exp..

[77]  Salles V. G. Magalhães,et al.  Evaluating the usage of exact queries on 3D spatial databases , 2020, GeoInfo.

[78]  Guillaume Houzeaux,et al.  Runtime mechanisms to survive new HPC architectures: A use case in human respiratory simulations , 2019, Int. J. High Perform. Comput. Appl..

[79]  Philip J. Rhodes,et al.  Accelerating range queries for large-scale unstructured meshes , 2016, 2016 IEEE International Conference on Big Data (Big Data).

[80]  Lei He,et al.  A large‐scale parallel hybrid grid generation technique for realistic complex geometry , 2020, International Journal for Numerical Methods in Fluids.

[81]  Thomas Marrinan,et al.  Parallel streaming between heterogeneous HPC resources for real-time analysis , 2019, J. Comput. Sci..

[82]  Yuri Bazilevs,et al.  High-performance computing of wind turbine aerodynamics using isogeometric analysis , 2011 .

[83]  Xiaodong Lin,et al.  Enabling Efficient and Geometric Range Query With Access Control Over Encrypted Spatial Data , 2019, IEEE Transactions on Information Forensics and Security.

[84]  Han-Wei Shen,et al.  A Near Optimal Isosurface Extraction Algorithm Using the Span Space , 1996, IEEE Trans. Vis. Comput. Graph..

[85]  Matthew B. Kennel KDTREE 2: Fortran 95 and C++ software to efficiently search for near neighbors in a multi-dimensional Euclidean space , 2004 .

[86]  John B. Bell,et al.  Interactive Exploration and Analysis of Large-Scale Simulations Using Topology-Based Data Segmentation , 2011, IEEE Transactions on Visualization and Computer Graphics.

[87]  Martin Schulz,et al.  Modeling the performance of an algebraic multigrid cycle on HPC platforms , 2011, ICS '11.

[88]  Mario A. López,et al.  A greedy algorithm for bulk loading R-trees , 1998, GIS '98.

[89]  Robert S. Laramee,et al.  The State of the Art in Flow Visualisation: Feature Extraction and Tracking , 2003, Comput. Graph. Forum.

[90]  Chris L. Jackins,et al.  Oct-trees and their use in representing three-dimensional objects , 1980 .

[91]  Jun Kong,et al.  iSPEED: an Efficient In-Memory Based Spatial Query System for Large-Scale 3D Data with Complex Structures , 2017, SIGSPATIAL/GIS.

[92]  H. V. Jagadish Spatial search with polyhedra , 1990, [1990] Proceedings. Sixth International Conference on Data Engineering.

[93]  Kim-Kwang Raymond Choo,et al.  Multi-dimensional data indexing and range query processing via Voronoi diagram for internet of things , 2019, Future Gener. Comput. Syst..