BarrierWatch: characterizing multithreaded workloads across and within program-defined epochs

Characterizing the dynamic behavior of a program is essential for optimizing the program on a given system. Once the program's repetitive execution phases (and their boundaries) have been correctly identified, various phase-aware optimizations can be applied. Multithreaded workloads exhibit dynamic behavior that is further affected by the sharing of data and platform resources. As computer systems and workloads become denser and more parallel, this effect will intensify the dynamicity of the executed workload. In this work, we introduce a new relaxed concept for a parallel program phase, called epoch. Epochs are defined as time intervals between global synchronization points that programmers insert into their program codes for correct parallel execution. We characterize the behavior of multithreaded workloads across and within epochs and show that epochs have consistent and repetitive behaviors while their boundaries naturally indicate a shift in program behavior. We show that epoch changes can be easily captured at run time without complex monitoring and decision mechanisms and we employ simple run-time techniques to enable epoch-based adaptation. To highlight the efficacy of our approach, we present a case study of an epoch-based adaptive chip multiprocessor (CMP) architecture. We conclude that our approach provides an attractive new framework for lightweight phase-based resource management for future CMPs.

[1]  D. Lenoski,et al.  The SGI Origin: A ccnuma Highly Scalable Server , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.

[2]  Rajeev Balasubramonian,et al.  Memory hierarchy reconfiguration for energy and performance in general-purpose processor architectures , 2000, MICRO 33.

[3]  R. Mccall Fundamental Statistics for Behavioral Sciences , 1986 .

[4]  Michael L. Scott,et al.  Profile-based dynamic voltage and frequency scaling for a multiple clock domain microprocessor , 2003, ISCA '03.

[5]  Chita R. Das,et al.  A case for dynamic frequency tuning in on-chip networks , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[6]  James E. Smith,et al.  Managing multi-configuration hardware via dynamic working set analysis , 2002, ISCA.

[7]  D. C. Howell Fundamental Statistics for the Behavioral Sciences , 1985 .

[8]  Brad Calder,et al.  Detecting phases in parallel applications on shared memory architectures , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.

[9]  Fredrik Larsson,et al.  Simics: A Full System Simulation Platform , 2002, Computer.

[10]  Anoop Gupta,et al.  The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.

[11]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[12]  Diana Marculescu,et al.  Variation-aware dynamic voltage/frequency scaling , 2009, 2009 IEEE 15th International Symposium on High Performance Computer Architecture.

[13]  Timothy Mattson,et al.  A 48-Core IA-32 message-passing processor with DVFS in 45nm CMOS , 2010, 2010 IEEE International Solid-State Circuits Conference - (ISSCC).

[14]  Chen Ding,et al.  Locality phase prediction , 2004, ASPLOS XI.

[15]  Alan Jay Smith,et al.  Branch Prediction Strategies and Branch Target Buffer Design , 1995, Computer.

[16]  Lieven Eeckhout,et al.  A Detailed Study on Phase Predictors , 2005, Euro-Par.

[17]  Mahmut T. Kandemir,et al.  Leakage Current: Moore's Law Meets Static Power , 2003, Computer.

[18]  David A. Patterson,et al.  Computer Architecture, Fifth Edition: A Quantitative Approach , 2011 .

[19]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[20]  Sandhya Dwarkadas,et al.  Characterizing and predicting program behavior and its variability , 2003, 2003 12th International Conference on Parallel Architectures and Compilation Techniques.

[21]  Michael C. Huang,et al.  Positional adaptation of processors: application to energy reduction , 2003, ISCA '03.

[22]  Margaret Martonosi,et al.  Runtime Power Monitoring in High-End Processors: Methodology and Empirical Data , 2003, MICRO.

[23]  Sangyeun Cho,et al.  Managing Distributed, Shared L2 Caches through OS-Level Page Allocation , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[24]  Lieven Eeckhout,et al.  Workload design: selecting representative program-input pairs , 2002, Proceedings.International Conference on Parallel Architectures and Compilation Techniques.

[25]  Lieven Eeckhout,et al.  Phase Complexity Surfaces: Characterizing Time-Varying Program Behavior , 2008, HiPEAC.

[26]  Yu Zhang,et al.  Analyzing the impact of on-chip network traffic on program phases for CMPs , 2009, 2009 IEEE International Symposium on Performance Analysis of Systems and Software.

[27]  Brad Calder,et al.  Structures for phase classification , 2004, IEEE International Symposium on - ISPASS Performance Analysis of Systems and Software, 2004.

[28]  Brad Calder,et al.  Phase tracking and prediction , 2003, ISCA '03.

[29]  Brad Calder,et al.  Automatically characterizing large scale program behavior , 2002, ASPLOS X.

[30]  Kai Li,et al.  The PARSEC benchmark suite: Characterization and architectural implications , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[31]  Brad Calder,et al.  Selecting software phase markers with code structure analysis , 2006, International Symposium on Code Generation and Optimization (CGO'06).

[32]  Michael Zhang,et al.  Victim Replication: Maximizing Capacity while Hiding Wire Delay in Tiled Chip Multiprocessors , 2005, ISCA 2005.

[33]  Pen-Chung Yew,et al.  A compiler-directed cache coherence scheme with improved intertask locality , 1994, Proceedings of Supercomputing '94.

[34]  Martin Burtscher,et al.  Program Phase Detection based on Critical Basic Block Transitions , 2008, ISPASS 2008 - IEEE International Symposium on Performance Analysis of Systems and software.

[35]  David Wentzlaff,et al.  Energy characterization of a tiled architecture processor with on-chip networks , 2003, ISLPED '03.

[36]  Wei Liu,et al.  EXPERT: expedited simulation exploiting program behavior repetition , 2004, ICS '04.