Computing Spatial Distance Histograms Efficiently in Scientific Databases

Particle simulation has become an important research tool in many scientific and engineering fields. Data ge nerated by such simulations impose great challenges to datab se storage and query processing. One of the queries against par ticle simulation data, the spatial distance histogram (SDH) quer y, is the building block of many high-level analytics, and requires quadratic time to compute using a straightforward algorithm. In this paper, we propose a novel algorithm to compute SDH based on a data structure called density map, which can be eas ily implemented by augmenting a Quad-tree index. We also show th e results of rigorous mathematical analysis of the time compl exity of the proposed algorithm: our algorithm runs on Θ(N 3 2 ) for two-dimensional data and Θ(N 5 3 ) for three-dimensional data, respectively. We also propose an approximate SDH processin g algorithm whose running time is unrelated to the input sizeN . Experimental results confirm our analysis and show that the approximate SDH algorithm achieves very high accuracy.

[1]  Carsten Kutzner,et al.  GROMACS 4:  Algorithms for Highly Efficient, Load-Balanced, and Scalable Molecular Simulation. , 2008, Journal of chemical theory and computation.

[2]  Terhi Töyli,et al.  bdbms - A Database Management System for Biological Data , 2008 .

[3]  Piet Hut,et al.  A hierarchical O(N log N) force-calculation algorithm , 1986, Nature.

[4]  Adriano Filipponi,et al.  The radial distribution function probed by X-ray absorption spectroscopy , 1994 .

[5]  Bijan Najafi,et al.  A new expression for radial distribution function and infinite shear modulus of Lennard-Jones fluids , 2006 .

[6]  J. Peacock,et al.  Simulations of the formation, evolution and clustering of galaxies and quasars , 2005, Nature.

[7]  Jimeng Sun,et al.  Analysis of predictive spatio-temporal queries , 2003, TODS.

[8]  Dimitrios Gunopulos,et al.  Indexing Moving Objects , 2008, ACM SIGSPATIAL International Workshop on Advances in Geographic Information Systems.

[9]  Berend Smit,et al.  Understanding molecular simulation: from algorithms to applications , 1996 .

[10]  Leslie Greengard,et al.  A fast algorithm for particle simulations , 1987 .

[11]  Peter Z. Kunszt,et al.  The SDSS skyserver: public access to the sloan digital sky server data , 2001, SIGMOD '02.

[12]  Jonathan W. Essex,et al.  BioSimGrid: Grid-enabled biomolecular simulation data storage and analysis , 2006, Future Gener. Comput. Syst..

[13]  P.-O. Åstrand Simulations of liquids , 1999 .

[14]  Jean-Luc Starck,et al.  Astronomical image and data analysis , 2002 .

[15]  Hanan Samet,et al.  Foundations of Multidimensional and Metric Data Structures (The Morgan Kaufmann Series in Computer Graphics and Geometric Modeling) , 2005 .

[16]  David J. DeWitt,et al.  Scientific data management in the coming decade , 2005, SGMD.

[17]  Jignesh M. Patel,et al.  Rethinking Choices for Multi-dimensional Point Indexing: Making the Case for the Often Ignored Quadtree , 2007, CIDR.

[18]  B. Montgomery Pettitt,et al.  Large scale distributed data repository: design of a molecular dynamics trajectory database , 1999, Future Gener. Comput. Syst..

[19]  Mats Wallin,et al.  Monte Carlo simulation in statistical physics , 2005 .

[20]  Christos Faloutsos,et al.  QBISM: extending a DBMS to support 3D medical images , 1994, Proceedings of 1994 IEEE 10th International Conference on Data Engineering.

[21]  Jignesh M. Patel,et al.  The Role of Declarative Querying in Bioinformatics , 2003, OMICS.

[22]  Alexander S. Szalay,et al.  Spatial Indexing of Large Multidimensional Databases , 2012, CIDR.

[23]  Jack A. Orenstein Multidimensional Tries Used for Associative Searching , 1982, Inf. Process. Lett..