Methodologies and Performance Metrics to Evaluate Multiprogram Workloads

Multicore processors are dominating the microprocessor market and most research work has moved to this kind of processors. Multicore research methods are still immature and evolving from the single-threaded processor ounterparts. Three main research issues must be faced when evaluating performance and energy in multicores. First, multiple simulation methodologies are being applied to evaluate these systems, without being an agreement about which to use. Second, due to the nature of multiprogram workloads new performance metrics are required, different from those used in single-thread processors. Many metrics have been defined and distinct metrics are used across the published works. Finally, multicore processors are really complex systems which require from sophisticated and complementary (e.g. energy and performance) simulators. This paper pursues to help researchers face the three mentioned research issues. For this purpose, we compare these issues across 28 papers published in 2013 in top computer architecture conferences. Both analytical examples and experimental results are presented with the aim of providing some insights in multicore research.

[1]  Rachata Ausavarungnirun,et al.  RowClone: Fast and energy-efficient in-DRAM bulk data copy and initialization , 2013, 2013 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[2]  Rami G. Melhem,et al.  Writeback-aware bandwidth partitioning for multi-core systems with PCM , 2013, Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques.

[3]  Thomas F. Wenisch,et al.  RDIP: Return-address-stack Directed Instruction Prefetching , 2013, 2013 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[4]  Benjamin C. Lee,et al.  Disintegrated control for energy-efficient and heterogeneous memory systems , 2013, 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA).

[5]  Somayeh Sardashti,et al.  Decoupled compressed cache: Exploiting spatial locality for energy-optimized compressed caching , 2013, 2013 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[6]  Lieven Eeckhout,et al.  Fairness-aware scheduling on single-ISA heterogeneous multi-cores , 2013, Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques.

[7]  Eby G. Friedman,et al.  AC-DIMM: associative computing with STT-MRAM , 2013, ISCA.

[8]  Bruce Jacob,et al.  DRAMSim2: A Cycle Accurate Memory System Simulator , 2011, IEEE Computer Architecture Letters.

[9]  Manoj Franklin,et al.  Balancing thoughput and fairness in SMT processors , 2001, 2001 IEEE International Symposium on Performance Analysis of Systems and Software. ISPASS..

[10]  Calvin Lin,et al.  Linearizing irregular memory accesses for improved correlated prefetching , 2013, 2013 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[11]  Pierre Michaud,et al.  Demystifying multicore throughput metrics , 2013, IEEE Computer Architecture Letters.

[12]  Eric Rotenberg,et al.  A unified view of non-monotonic core selection and application steering in heterogeneous chip multiprocessors , 2013, Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques.

[13]  Stijn Eyerman,et al.  System-Level Performance Metrics for Multiprogram Workloads , 2008, IEEE Micro.

[14]  Jaydeep P. Kulkarni,et al.  Improving multi-core performance using mixed-cell cache architecture , 2013, 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA).

[15]  John R. Mashey,et al.  War of the benchmark means: time for a truce , 2004, CARN.

[16]  Stijn Eyerman,et al.  Restating the Case for Weighted-IPC Metrics to Evaluate Multiprogram Workload Performance , 2014, IEEE Computer Architecture Letters.

[17]  Sudhir K. Satpathy,et al.  Catnap: energy proportional multiple network-on-chip , 2013, ISCA.

[18]  Mahmut T. Kandemir,et al.  Meeting midway: Improving CMP performance with memory-side prefetching , 2013, Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques.

[19]  Scott A. Mahlke,et al.  Trace based phase prediction for tightly-coupled heterogeneous cores , 2013, 2013 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[20]  Reetuparna Das,et al.  Application-to-core mapping policies to reduce memory interference in multi-core systems , 2012, 2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT).

[21]  Rajeev Balasubramonian,et al.  Quantifying the relationship between the power delivery network and architectural policies in a 3D-stacked memory device , 2013, 2013 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[22]  Víctor Viñals,et al.  The reuse cache: Downsizing the shared last-level cache , 2013, 2013 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[23]  Onur Mutlu,et al.  Coordinated control of multiple prefetchers in multi-core systems , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[24]  Brad Calder,et al.  Using SimPoint for accurate and efficient simulation , 2003, SIGMETRICS '03.

[25]  Antonia Zhai,et al.  Managing shared last-level cache in a heterogeneous multicore processor , 2013, Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques.

[26]  David Black-Schaffer,et al.  TLC: A tag-less cache for reducing dynamic first level cache energy , 2013, 2013 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[27]  Daniel A. Jiménez Insertion and promotion for tree-based PseudoLRU last-level caches , 2013, 2013 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[28]  Babak Falsafi,et al.  Die-stacked DRAM caches for servers: hit ratio, latency, or bandwidth? have it all with footprint cache , 2013, ISCA.