Characterizing and Understanding the Bandwidth Behavior of Workloads on Multi-core Processors

An important issue of current multi-core processors is the off-chip bandwidth sharing. Sharing is helpful to improve resource utilization and but more importantly and it may cause performance degradation due to contention. However and there is not enough research work on characterizing the workloads from bandwidth perspective. Moreover and the understanding of the impact of the bandwidth constraint on performance is still limited. In this paper and we propose the phase execution model and and evaluate the arithmetic to memory ratio (AMR) of each phase to characterize the bandwidth requirements of arbitrary programs. We apply the model to a set of SPEC benchmark programs and obtain two results. First and we propose a new taxonomy of workloads based on their bandwidth requirements. Second and we find that prefetching techniques are useful to improve system throughput of multi-core processors only when there is enough spare memory bandwidth.

[1]  Weiwu Hu,et al.  Microarchitecture of the Godson-2 Processor , 2005, Journal of Computer Science and Technology.

[2]  Guang R. Gao,et al.  Experience on optimizing irregular computation for memory hierarchy in manycore architecture , 2008, ACM SIGPLAN Symposium on Principles & Practice of Parallel Programming.

[3]  Onur Mutlu,et al.  Software-Based Online Detection of Hardware Defects Mechanisms, Architectural Support, and Evaluation , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[4]  Yuan Chou,et al.  Low-Cost Epoch-Based Correlation Prefetching for Commercial Applications , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[5]  Dongrui Fan,et al.  Architectural support for cilk computations on many-core architectures , 2009, PPoPP '09.

[6]  Dongrui Fan,et al.  A Performance Model of Dense Matrix Operations on Many-Core Architectures , 2008, Euro-Par.

[7]  Trevor N. Mudge,et al.  Trace-driven memory simulation: a survey , 1997, CSUR.

[8]  Guang R. Gao,et al.  A parallel dynamic programming algorithm on a multi-core architecture , 2007, SPAA '07.