Implementing a General Spatial Indexing Library for Relational Databases of Large Numerical Simulations

Large multi-terabyte numerical simulations of different physical systems consist of billions of particles or grid points and hundreds to thousands of snapshots. Increasingly these data sets are stored in large object-relational databases. Most statistical analyses involve extracting various spatio-temporal subsets. Existing built-in spatial indexes in commercial systems lack essential features required for many applications in the physical sciences. We describe a library that we have implemented in several languages and platforms (Java/Oracle, C#/SQL Server) based on generic space-filling curves, implemented as plug-ins. The index provides a mapping of higher dimensional space into the standard linear B-tree index of any relational database. The architecture allows intersections with different geometric primitives. The library has been used for cosmological N-body simulations and isotropic turbulence, providing sub-second response time over datasets exceeding several tens of terabytes. The library can also address complex space-time challenges, like temporal look-back into past light-cones of cosmological simulations.

[1]  G. Lemson,et al.  Halo and Galaxy Formation Histories from the Millennium Simulation: Public release of a VO-oriented and SQL-queryable database for studying the evolution of galaxies in the LambdaCDM cosmogony , 2006, astro-ph/0608019.

[2]  Rudolf Bayer,et al.  The Universal B-Tree for Multidimensional Indexing: general Concepts , 1997, WWCA.

[3]  M. Taghizadeh-Popp CfunBASE: A Cosmological Functions Library for Astronomical Databases , 2010 .

[4]  Peter Z. Kunszt,et al.  The SDSS skyserver: public access to the sloan digital sky server data , 2001, SIGMOD '02.

[5]  Christos Faloutsos,et al.  Analysis of the Clustering Properties of the Hilbert Space-Filling Curve , 2001, IEEE Trans. Knowl. Data Eng..

[6]  Christos Faloutsos,et al.  Hilbert R-tree: An Improved R-tree using Fractals , 1994, VLDB.

[7]  H. V. Jagadish,et al.  Linear clustering of objects with multiple attributes , 1990, SIGMOD '90.

[8]  Volker Springel,et al.  Resolving cosmic structure formation with the Millennium-II simulation , 2009, 0903.3041.

[9]  J. Stadel,et al.  Clumps and streams in the local dark matter distribution , 2008, Nature.

[10]  Durham,et al.  The Aquarius Project: the subhaloes of galactic haloes , 2008, 0809.0898.

[11]  Alexander S. Szalay,et al.  Designing and mining multi-terabyte astronomy archives: the Sloan Digital Sky Survey , 2000, SIGMOD 2000.

[12]  Earl Lawrence,et al.  THE COYOTE UNIVERSE. III. SIMULATION SUITE AND PRECISION EMULATOR FOR THE NONLINEAR MATTER POWER SPECTRUM , 2009, 0912.4490.

[13]  Christos Faloutsos,et al.  DOT: A Spatial Access Method Using Fractals , 1991, ICDE.

[14]  J. Peacock,et al.  Simulations of the formation, evolution and clustering of galaxies and quasars , 2005, Nature.

[15]  Nolan Li,et al.  Batch is back: CasJobs, serving multi-TB data on the Web , 2005, IEEE International Conference on Web Services (ICWS'05).

[16]  Volker Markl,et al.  Mistral - Processing Relational Queries using a Multidimensional Access Technique , 1999, Datenbank Rundbr..

[17]  Hanan Samet,et al.  Foundations of multidimensional and metric data structures , 2006, Morgan Kaufmann series in data management systems.

[18]  H. Sagan Space-filling curves , 1994 .