Compression in Molecular Simulation Datasets

In this paper, we present a compression framework, for molecular dynamics (MD) simulation data, which yields significant performance by combining the strength of principal component analysis (PCA) and discrete cosine transform (DCT). Though it is a lossy compression technique, the effect on analytics performed on decompressed data is very minimal. Compression ratio up to 13 is achieved with acceptable errors in results of analytical functions.

[1]  Scott Klasky,et al.  The Center for Plasma Edge Simulation Workflow Requirements , 2006, 22nd International Conference on Data Engineering Workshops (ICDEW'06).

[2]  Shaoping Chen,et al.  Computing Distance Histograms Ef?ciently in Scientific Databases , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[3]  Rajiv K. Kalia,et al.  Scalable I/O of large-scale molecular dynamics simulations: A data-compression algorithm , 2000 .

[4]  Yannis E. Ioannidis,et al.  Histogram-Based Approximation of Set-Valued Query-Answers , 1999, VLDB.

[5]  William G. Cochran,et al.  Sampling Techniques, 3rd Edition , 1963 .

[6]  Praveen Seshadri,et al.  An algebraic compression framework for query results , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[7]  Berend Smit,et al.  Understanding molecular simulation: from algorithms to applications , 1996 .

[8]  David G. Stork,et al.  Pattern classification, 2nd Edition , 2000 .

[9]  Bijan Najafi,et al.  A new expression for radial distribution function and infinite shear modulus of Lennard-Jones fluids , 2006 .

[10]  Y. Arai,et al.  A Fast DCT-SQ Scheme for Images , 1988 .

[11]  P. Yip,et al.  Discrete Cosine Transform: Algorithms, Advantages, Applications , 1990 .

[12]  D. Huffman A Method for the Construction of Minimum-Redundancy Codes , 1952 .

[13]  David G. Stork,et al.  Pattern Classification , 1973 .

[14]  David Salomon,et al.  Data Compression: The Complete Reference , 2006 .

[15]  Ian H. Witten,et al.  Arithmetic coding for data compression , 1987, CACM.

[16]  Kyuseok Shim,et al.  Approximate query processing using wavelets , 2001, The VLDB Journal.

[17]  Abraham Lempel,et al.  A universal algorithm for sequential data compression , 1977, IEEE Trans. Inf. Theory.

[18]  Jean-Luc Starck,et al.  Astronomical image and data analysis , 2002 .

[19]  J. Starck,et al.  Astronomical Image and Data Analysis (Astronomy and Astrophysics Library) , 2006 .

[20]  A. Winsor Sampling techniques. , 2000, Nursing times.

[21]  N. Ahmed,et al.  Discrete Cosine Transform , 1996 .

[22]  Charles A Laughton,et al.  Essential Dynamics:  A Tool for Efficient Trajectory Compression and Management. , 2006, Journal of chemical theory and computation.

[23]  Jayant R. Haritsa,et al.  Database Compression: A Performance Enhancement Tool , 1995, COMAD.