Beyond the CM-5: A case study in performance analysis for the CM-5, T3D, and high performance RISC workstations
暂无分享,去创建一个
We present a comprehensive performance evaluation of our molecular dynamics code SPaSM on the CM-5 in order to devise optimization strategies for the CM-5, T3D, and RISC workstations. In this analysis, we focus on the effective use of the SPARC microprocessor by performing measurements of instruction set utilization, cache effects, memory access patterns, and pipeline stall cycles. We then show that we can account for more than 99% of observed execution time of our program. Optimization strategies are devised and we show that our highly optimized ANSI C program running only on the SPARC microprocessor of the CM-5 is only twice as slow as our Gordon-Bell prize winning code that utilized the CM-5 vector units. On the CM-5E, we show that this optimized code run faster than the vector unit version. We then apply these techniques to the Cray T3D and measure resulting speedups. Finally, we show that simple optimization strategies are effective on a wide variety of high performance RISC workstations.
[1] David M. Beazley,et al. A high performance communications and memory caching scheme for molecular dynamics on the CM-5 , 1994, Proceedings of 8th International Parallel Processing Symposium.