Multi-program benchmark definition

Although definition of single-program benchmarks is relatively straight-forward-a benchmark is a program plus a specific input-definition of multi-program benchmarks is more complex. Each program may have a different runtime and they may have different interactions depending on how they align with each other. While prior work has focused on sampling multiprogram benchmarks, little attention has been paid to defining the benchmarks in their entirety. In this work, we propose a four-tuple that formally defines multi-program benchmarks in a well-defined way. We then examine how four different classes of benchmarks created by varying the elements of this tuple align with real-world use-cases. We evaluate the impact of these variations on real hardware, and see drastic variations in results between different benchmarks constructed from the same programs. Notable differences include significant speedups versus slowdowns (e.g., +57% vs -5% or +26% vs -18%), and large differences in magnitude even when the results are in the same direction (e.g., 67% versus 11%).

[1]  Dean M. Tullsen,et al.  Symbiotic jobscheduling for a simultaneous mutlithreading processor , 2000, SIGP.

[2]  H. Peter Hofstee,et al.  Introduction to the Cell multiprocessor , 2005, IBM J. Res. Dev..

[3]  Amir Roth,et al.  FIESTA: A Sample-Balanced Multi-Program Workload Methodology , 2009 .

[4]  James E. Smith,et al.  Fair Queuing Memory Systems , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[5]  Thomas F. Wenisch,et al.  SimFlex: Statistical Sampling of Computer System Simulation , 2006, IEEE Micro.

[6]  Pierre Michaud,et al.  Demystifying multicore throughput metrics , 2013, IEEE Computer Architecture Letters.

[7]  Dean M. Tullsen,et al.  Symbiotic jobscheduling with priorities for a simultaneous multithreading processor , 2002, SIGMETRICS '02.

[8]  Brad Calder,et al.  A co-phase matrix to guide simultaneous multithreading simulation , 2004, IEEE International Symposium on - ISPASS Performance Analysis of Systems and Software, 2004.

[9]  M TullsenDean,et al.  Symbiotic jobscheduling for a simultaneous mutlithreading processor , 2000 .

[10]  Won-Taek Lim,et al.  Effective Management of DRAM Bandwidth in Multicore Processors , 2007, 16th International Conference on Parallel Architecture and Compilation Techniques (PACT 2007).

[11]  Christina Delimitrou,et al.  Paragon: QoS-aware scheduling for heterogeneous datacenters , 2013, ASPLOS '13.

[12]  Francisco J. Cazorla,et al.  FAME: FAirly MEasuring Multithreaded Architectures , 2007, 16th International Conference on Parallel Architecture and Compilation Techniques (PACT 2007).

[13]  Francisco J. Cazorla,et al.  A Flexible Heterogeneous Multi-Core Architecture , 2007, 16th International Conference on Parallel Architecture and Compilation Techniques (PACT 2007).

[14]  Yiannakis Sazeides,et al.  How to compare the performance of two SMT microarchitectures , 2001, 2001 IEEE International Symposium on Performance Analysis of Systems and Software. ISPASS..

[15]  Dam Sunwoo,et al.  Balancing DRAM locality and parallelism in shared memory CMP systems , 2012, IEEE International Symposium on High-Performance Comp Architecture.

[16]  Steven K. Reinhardt,et al.  The impact of resource partitioning on SMT processors , 2003, 2003 12th International Conference on Parallel Architectures and Compilation Techniques.

[17]  Pierre Michaud,et al.  Selecting benchmark combinations for the evaluation of multicore throughput , 2013, 2013 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).

[18]  Stijn Eyerman,et al.  System-Level Performance Metrics for Multiprogram Workloads , 2008, IEEE Micro.

[19]  Onur Mutlu,et al.  MISE: Providing performance predictability and improving fairness in shared main memory systems , 2013, 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA).

[20]  Amir Roth,et al.  CPROB: Checkpoint Processing with Opportunistic Minimal Recovery , 2009, 2009 18th International Conference on Parallel Architectures and Compilation Techniques.

[21]  Stijn Eyerman,et al.  Restating the Case for Weighted-IPC Metrics to Evaluate Multiprogram Workload Performance , 2014, IEEE Computer Architecture Letters.

[22]  Brad Calder,et al.  Automatically characterizing large scale program behavior , 2002, ASPLOS X.

[23]  Lieven Eeckhout,et al.  Representative Multiprogram Workloads for Multithreaded Processor Simulation , 2007, 2007 IEEE 10th International Symposium on Workload Characterization.