Multi-paradigm, multi-threaded and multi-core computing devices available today provide several orders of magnitude performance improvement over mainstream microprocessors. These devices include the STI Cell Broadband Engine, graphical processing units (GPU) and the Cray massively-multithreaded processors - available in desktop computing systems as well as proposed for supercomputing platforms. The main challenge in utilizing these powerful devices is their unique programming paradigms. GPUs and the Cell systems require code developers to manage code and data explicitly, while the Cray multithreaded architecture requires them to generate a very large number of threads or independent tasks concurrently. In this paper, we explain strategies for optimizing a molecular dynamics (MD) calculation that is used in biomolecular simulations on three devices: Cell, GPU and MTA-2. We show that the Cray MTA-2 system requires minimal code modification and does not outperform the microprocessor runs; but it demonstrates an improved workload scaling behavior over the microprocessor implementation. On the other hand, substantial porting and optimization efforts on the Cell and the GPU systems result in a 5times to 6times improvement, respectively, over a 2.2 GHz Opteron system.
[1]
Weiguo Liu,et al.
Bio-sequence database scanning on a GPU
,
2006,
Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.
[2]
B. Flachs,et al.
The microarchitecture of the synergistic processor for a cell processor
,
2006,
IEEE Journal of Solid-State Circuits.
[3]
Pat Hanrahan,et al.
Data Parallel Computation on Graphics Hardware
,
2003
.
[4]
H. Peter Hofstee,et al.
Introduction to the Cell multiprocessor
,
2005,
IBM J. Res. Dev..
[5]
Samuel Williams,et al.
The potential of the cell processor for scientific computing
,
2005,
CF '06.
[6]
Shahid H. Bokhari,et al.
Sequence alignment on the Cray MTA-2
,
2003,
Proceedings International Parallel and Distributed Processing Symposium.
[7]
Yang Liu,et al.
GPU Accelerated Smith-Waterman
,
2006,
International Conference on Computational Science.
[8]
S.H. Dhong,et al.
A fully pipelined single-precision floating-point unit in the synergistic processor element of a CELL processor
,
2006,
IEEE Journal of Solid-State Circuits.