Performance characterization of molecular dynamics techniques for biomolecular simulations

Large-scale simulations and computational modeling using molecular dynamics (MD) continues to make significant impacts in the field of biology. It is well known that simulations of biological events at native time and length scales requires computing power several orders of magnitude beyond today's commonly available systems. Supercomputers, such as IBM Blue Gene/L and Cray XT3, will soon make tens to hundreds of teraFLOP/s of computing power available by utilizing thousands of processors. The popular algorithms and MD applications, however, were not initially designed to run on thousands of processors. In this paper, we present detailed investigations of the performance issues, which are crucial for improving the scalability of the MD-related algorithms and applications on massively parallel processing (MPP) architectures. Due to the varying characteristics of biological input problems, we study two prototypical biological complexes that use the MD algorithm: an explicit solvent and an implicit solvent. In particular, we study the AMBER application, which supports a variety of these types of input problems. For the explicit solvent problem, we focused on the particle mesh Ewald (PME) method for calculating the electrostatic energy, and for the implicit solvent model, we targeted the Generalized Born (GB) calculation. We uncovered and subsequently modified a limitation in AMBER that restricted the scaling beyond 128 processors. We collected performance data for experiments on up to 2048 Blue Gene/L and XT3 processors and subsequently identified that the scaling is largely limited by the underlying algorithmic characteristics and also by the implementation of the algorithms. Furthermore, we found that the input problem size of biological system is constrained by memory available per node. In conclusion, our results indicate that MD codes can significantly benefit from the current generation architectures with relatively modest optimization efforts. Nevertheless, the key for enabling scientific breakthroughs lies in exploiting the full potential of these new architectures.

[1]  N. Goodman Biological data becomes computer literate: new advances in bioinformatics. , 2002, Current opinion in biotechnology.

[2]  P. Agarwal Role of protein dynamics in reaction rate enhancement by enzymes. , 2005, Journal of the American Chemical Society.

[3]  T. Darden,et al.  Particle mesh Ewald: An N⋅log(N) method for Ewald sums in large systems , 1993 .

[4]  Laxmikant V. Kale,et al.  NAMD2: Greater Scalability for Parallel Molecular Dynamics , 1999 .

[5]  A. Geist,et al.  Protein dynamics and enzymatic catalysis: investigating the peptidyl-prolyl cis-trans isomerization activity of cyclophilin A. , 2004, Biochemistry.

[6]  P. Agarwal Enzymes: An integrated view of structure, dynamics and function , 2006, Microbial cell factories.

[7]  Steve Plimpton,et al.  Fast parallel algorithms for short-range molecular dynamics , 1993 .

[8]  Ajay K. Royyuru,et al.  Blue Gene: A vision for protein science using a petaflop supercomputer , 2001, IBM Syst. J..

[9]  H. Kitano,et al.  Computational systems biology , 2002, Nature.

[10]  D. Case,et al.  Theory and applications of the generalized born solvation model in macromolecular simulations , 2000, Biopolymers.

[11]  M. Karplus,et al.  CHARMM: A program for macromolecular energy, minimization, and dynamics calculations , 1983 .

[12]  Sadaf R. Alam,et al.  Early evaluation of the Cray XT3 , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.

[13]  David F. Heidel,et al.  An Overview of the BlueGene/L Supercomputer , 2002, ACM/IEEE SC 2002 Conference (SC'02).

[14]  Peter A. Kollman,et al.  AMBER, a package of computer programs for applying molecular mechanics, normal mode analysis, molecular dynamics and free energy calculations to simulate the structural and energetic properties of molecules , 1995 .

[15]  Jack J. Dongarra,et al.  A Portable Programming Interface for Performance Evaluation on Modern Processors , 2000, Int. J. High Perform. Comput. Appl..

[16]  M. Karplus,et al.  Molecular dynamics simulations in biology , 1990, Nature.

[17]  Laxmikant V. Kalé,et al.  Simulation-Based Performance Prediction for Large Parallel Machines , 2005, International Journal of Parallel Programming.

[18]  J. Valverde Molecular Modelling: Principles and Applications , 2001 .

[19]  Robert S. Germain,et al.  Blue Matter, an application framework for molecular simulation on Blue Gene , 2003, J. Parallel Distributed Comput..

[20]  Laxmikant V. Kalé,et al.  NAMD: Biomolecular Simulation on Thousands of Processors , 2002, ACM/IEEE SC 2002 Conference (SC'02).

[21]  José E. Moreira,et al.  Demonstrating the Scalability of a Molecular Dynamics Application on a Petaflops Computer , 2002, International Journal of Parallel Programming.