Evaluating Scalability of Emerging Multithreaded Applications on Commodity Multicore Server

The performance of multithreaded applications is often limited by resources such as shared cache and memory bandwidth. Several prior studies have examined this issue, but most of them have been constrained by the use of simulators and out-of-date benchmarks. In this work, we conduct an experiment on real commodity CMP machines, using a recently released CMP benchmark suite, PARSEC, to investigate the influence of cache sharing and memory bandwidth on the scalability of emerging parallel applications. The results reveal that the behavioral characteristics of these benchmarks. We find that the shared cache and memory bandwidth are indeed the bottlenecks for some of these applications. The conclusion provides implications for hardware manufacturers and system software designers to build scalable parallel system.

[1]  Anoop Gupta,et al.  The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.

[2]  Samuel Williams,et al.  The Landscape of Parallel Computing Research: A View from Berkeley , 2006 .

[3]  James C. Hoe,et al.  Single-Chip Heterogeneous Computing: Does the Future Include Custom Logic, FPGAs, and GPGPUs? , 2010, 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture.

[4]  P. K. Dubey,et al.  Recognition, Mining and Synthesis Moves Comp uters to the Era of Tera , 2005 .

[5]  Kai Li,et al.  The PARSEC benchmark suite: Characterization and architectural implications , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[6]  Lesley Anne Polka Package Technology to Address the Memory Bandwidth Challenge for Terascale Computing , 2007 .

[7]  GuptaAnoop,et al.  The SPLASH-2 programs , 1995 .

[8]  Kai Li,et al.  PARSEC vs. SPLASH-2: A quantitative comparison of two multithreaded benchmark suites on Chip-Multiprocessors , 2008, 2008 IEEE International Symposium on Workload Characterization.

[9]  Jichuan Chang,et al.  Cooperative Caching for Chip Multiprocessors , 2006, 33rd International Symposium on Computer Architecture (ISCA'06).

[10]  Xipeng Shen,et al.  Does cache sharing on modern CMP matter to the performance of contemporary multithreaded programs? , 2010, PPoPP '10.

[11]  Mark D. Hill,et al.  Amdahl's Law in the Multicore Era , 2008, Computer.

[12]  Yale N. Patt,et al.  Utility-Based Cache Partitioning: A Low-Overhead, High-Performance, Runtime Mechanism to Partition Shared Caches , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[13]  Hsien-Hsin S. Lee,et al.  Extending Amdahl's Law for Energy-Efficient Computing in the Many-Core Era , 2008, Computer.