MARS: A multi-level array representation for simulation data

Abstract In the numerical simulation domain, owing to the huge size of data and the complexity of implementing the domain specific applications, a database-centric approach for handling multidimensional simulation data is gaining considerable attention. Array databases provide an optimized set of features to support administrating multidimensional data; representing simulation data with an array can be an optimal choice. Generally, query performance on sparsely filled arrays, especially when empty cells are placed between adjacent elements, can be poor. In this context, previous studies focused on the compact representation of simulation data by reducing the number of empty cells between adjacent elements as possible. However, these methods inevitably lose the original spatial structure of elements (i.e., the relative distance and direction among elements), making it impossible to utilize the built-in multidimensional operators provided by array databases. In this paper, we propose MARS, a multi-level array representation for simulation data. MARS utilizes multiple level arrays with various resolutions to cope with the two addressed problems. In the MARS representation, elements tend to be concentrated into dense array regions, where each region is selectively stored in one of the level arrays that most reduces the empty cells between adjacent elements. Unlike existing methods, MARS retains the spatial structure of elements, and thus no additional efforts to reorganize the original spatial structure for query processing is required. We built MARS on top of SciDB and implemented a specialized command line tool for MARS. We present methods and optimized operators for query processing over MARS. We evaluate the performance of MARS using two real-world numerical simulation datasets.

[1]  Jeffrey Heer,et al.  The Effects of Interactive Latency on Exploratory Visual Analysis , 2014, IEEE Transactions on Visualization and Computer Graphics.

[2]  Chaoqun Liu,et al.  CFD Techniques—The Basics , 2013 .

[3]  Peter J. H. King,et al.  Querying multi-dimensional data indexed using the Hilbert space-filling curve , 2001, SGMD.

[4]  Alireza Rezaei Mahdiraji Database Support for Unstructured Meshes , 2013, Proc. VLDB Endow..

[5]  Gerd Heber,et al.  Efficient query processing on unstructured tetrahedral meshes , 2006, SIGMOD Conference.

[6]  Jiyuan Tu Chapter 4 – CFD Techniques—The Basics , 2008 .

[7]  Peter Baumann,et al.  On the efficient evaluation of array joins , 2015, 2015 IEEE International Conference on Big Data (Big Data).

[8]  Shoji Nishimura,et al.  QUILTS: Multidimensional Data Partitioning Framework Based on Query-Aware and Skew-Tolerant Space-Filling Curves , 2017, SIGMOD Conference.

[9]  Magdalena Balazinska,et al.  ArrayStore: a storage manager for complex parallel array processing , 2011, SIGMOD '11.

[10]  Peter Baumann,et al.  MQuery: A query language for scientific meshes , 2015, 2015 IEEE International Conference on Big Data (Big Data).

[11]  Hermano Lustosa,et al.  Database System Support of Simulation Data , 2016, Proc. VLDB Endow..

[12]  Verena Kantere,et al.  Managing scientific data , 2010, Commun. ACM.

[13]  Paul G. Brown,et al.  Overview of sciDB: large scale array storage, processing and analysis , 2010, SIGMOD Conference.

[14]  Philip H. Carns,et al.  Efficient I/O and Storage of Adaptive-Resolution Data , 2014, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis.

[15]  Il-Yeol Song,et al.  Modeling and Management of Big Data: Challenges and opportunities , 2016, Future Gener. Comput. Syst..

[16]  Michael Stonebraker,et al.  Dynamic Prefetching of Data Tiles for Interactive Visualization , 2016, SIGMOD Conference.

[17]  Philip J. Rhodes,et al.  Accelerating range queries for large-scale unstructured meshes , 2016, 2016 IEEE International Conference on Big Data (Big Data).

[18]  Peter Baumann,et al.  The multidimensional database system RasDaMan , 1998, SIGMOD '98.

[19]  Houjun Tang,et al.  In Situ Storage Layout Optimization for AMR Spatio-temporal Read Accesses , 2016, 2016 45th International Conference on Parallel Processing (ICPP).

[20]  Michael Stonebraker,et al.  Skew-Aware Join Optimization for Array Databases , 2015, SIGMOD Conference.

[21]  Bill Howe Gridfields: model-driven data transformation in the physical sciences , 2007 .

[22]  Torben Bach Pedersen,et al.  Multidimensional Database Technology , 2001, Computer.

[23]  S. Popinet Gerris: a tree-based adaptive solver for the incompressible Euler equations in complex geometries , 2003 .

[24]  Stavros Papadopoulos,et al.  The TileDB Array Data Storage Manager , 2016, Proc. VLDB Endow..

[25]  Philip J. Rhodes,et al.  Towards an efficient storage and retrieval mechanism for large unstructured grids , 2015, Future Gener. Comput. Syst..

[26]  Michael Stonebraker,et al.  A Demonstration of SciDB: A Science-Oriented DBMS , 2009, Proc. VLDB Endow..

[27]  Michael Stonebraker,et al.  The Architecture of SciDB , 2011, SSDBM.

[28]  Luis Carlos Erpen De Bona,et al.  Cubrick: Indexing Millions of Records per Second for Interactive Analytics , 2016, Proc. VLDB Endow..

[29]  Michael Stonebraker,et al.  SciDB: A Database Management System for Applications with Complex Analytics , 2013, Computing in Science & Engineering.