Analyzing the Efficiency and Bottleneck of Scientific Programs on Imagine Stream Processor by Simulation

Imagine stream processor has shown high performance and efficiency for media applications. Its potential for scientific applications is of great interest to the high performance computing community. This paper investigates this subject from a new angle. It roughly classifies the scientific programs into three classes based on their computation to memory access ratios. For each class, typical programs are programmed with StreamC/KernelC stream language and simulated based on the cycle-accurate simulator of Imagine. In-depth analysis is carried out for the performance data, with special attentions on the performance bottlenecks. The performance data obtained on Imagine are compared against data on two general-purpose x86 processors. The results show that programs with no DRAM accesses attain high floating point performance and efficiencies on Imagine. These programs' performance is only restricted by limited ILP (Instruction-Level Parallelism) and load imbalance across ALUs. Programs with computation to memory operation ratios O(n) attain absolute floating point performance on Imagine comparable to that obtained on general-purpose processors, but their floating-point efficiencies are not satisfactory. It is essential to optimize these programs for high SRF (Stream Register File) and LRF (Local Register File) reuse and high ILP on Imagine. Programs with lower computation to memory operation ratios attain much lower floating-point performance and efficiencies on Imagine, compared to those obtained on x86 processors.

[1]  William J. Dally,et al.  Media processing applications on the Imagine stream processor , 2002, Proceedings. IEEE International Conference on Computer Design: VLSI in Computers and Processors.

[2]  Ying Zhang,et al.  Implementing and Optimizing a Data-Intensive Hydrodynamics Application on the Stream Processor , 2007, ICCSA.

[3]  Nan Wu,et al.  Multiple-Dimension Scalable Adaptive Stream Architecture , 2004, Asia-Pacific Computer Systems Architecture Conference.

[4]  Peter Mattson,et al.  A programming system for the imagine media processor , 2002 .

[5]  William J. Dally,et al.  Evaluating the Imagine stream architecture , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..

[6]  Xuejun Yang,et al.  Scientific Computing Applications on the Imagine Stream Processor , 2006, Asia-Pacific Computer Systems Architecture Conference.

[7]  William J. Dally,et al.  Compiling for stream processing , 2006, 2006 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[8]  Wu Wei Implementation and Evaluating of a 2D Lagrange-Euler Method on MASA Stream Processor , 2006 .

[9]  Leonid Oliker,et al.  Transitive closure on the imagine stream processor , 2003 .

[10]  Ying Zhang,et al.  A 64-bit stream processor architecture for scientific applications , 2007, ISCA '07.

[11]  Jung Ho Ahn,et al.  Merrimac: Supercomputing with Streams , 2003, ACM/IEEE SC 2003 Conference (SC'03).

[12]  William J. Dally,et al.  Imagine: Media Processing with Streams , 2001, IEEE Micro.