Architectural Support for Handling Jitterin Shared Memory Based Parallel Applications

With an increasing number of cores per chip, it is becoming harder to guarantee optimal performance for parallel shared memory applications due to interference caused by kernel threads, interrupts, bus contention, and temperature management schemes (referred to as jitter). We demonstrate that the performance of parallel programs gets reduced (up to 35.22 percent) in large CMP based systems. In this paper, we characterize the jitter for large multi-core processors, and evaluate the loss in performance. We propose a novel jitter measurement unit that uses a distributed protocol to keep track of the number of wasted cycles. Subsequently, we try to compensate for jitter by using DVFS across a region of timing critical instructions called a frame. Additionally, we propose an OS cache that intelligently manages the OS cache lines to reduce memory interference. By performing detailed cycle accurate simulations, we show that we are able to execute a suite of Splash2 and Parsec benchmarks with a deterministic timing overhead limited to 2 percent for 14 out of 17 benchmarks with modest DVFS factors. We reduce the overall jitter by an average 13.5 percent for Splash2 and 6.4 percent for Parsec. The area overhead of our scheme is limited to 1 percent.

[1]  Josep Torrellas,et al.  FlexBulk: Intelligently forming atomic blocks in blocked-execution multiprocessors to minimize squashes , 2011, 2011 38th Annual International Symposium on Computer Architecture (ISCA).

[2]  Vijay Janapa Reddi,et al.  PIN: a binary instrumentation tool for computer architecture research and education , 2004, WCAE '04.

[3]  Dan Tsafrir,et al.  System noise, OS clock ticks, and fine-grained parallel applications , 2005, ICS '05.

[4]  Christian Bienia,et al.  Benchmarking modern multiprocessors , 2011 .

[5]  Kai Li,et al.  The PARSEC benchmark suite: Characterization and architectural implications , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[6]  Francisco J. Cazorla,et al.  Hardware support for WCET analysis of hard real-time multicore systems , 2009, ISCA '09.

[7]  F. Petrini,et al.  The Case of the Missing Supercomputer Performance: Achieving Optimal Performance on the 8,192 Processors of ASCI Q , 2003, ACM/IEEE SC 2003 Conference (SC'03).

[8]  Pradipta De,et al.  Handling OS jitter on multicore multithreaded systems , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[9]  David W. Nellans,et al.  Interference Aware Cache Designs for Operating System Execution , 2009 .

[10]  Fredrik Larsson,et al.  Simics: A Full System Simulation Platform , 2002, Computer.

[11]  Anoop Gupta,et al.  The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.

[12]  Margaret Martonosi,et al.  Wattch: a framework for architectural-level power analysis and optimizations , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[13]  Loutfi Nuaymi,et al.  Wimax Technology for Broadband Wireless Access , 2007 .

[14]  T. J. Watson,et al.  Fuss , Futexes and Furwocks : Fast Userlevel Locking in Linux Hubertus Franke IBM , 2005 .

[15]  Paul Terry,et al.  Improving application performance on HPC systems with process synchronization , 2004 .

[16]  Natarajan Meghanathan,et al.  A Survey of Contemporary Real-time Operating Systems , 2005, Informatica.

[17]  Lin Chen,et al.  A software WiMAX medium access control layer using massively multithreaded processors , 2010, IBM J. Res. Dev..

[18]  Dan Tsafrir,et al.  The context-switch overhead inflicted by hardware interrupts (and the enigma of do-nothing loops) , 2007, ExpCS '07.

[19]  Fabrice Bellard,et al.  QEMU, a Fast and Portable Dynamic Translator , 2005, USENIX ATC, FREENIX Track.

[20]  Jonathan Walpole,et al.  Supporting time-sensitive applications on a commodity OS , 2002, OPSR.

[21]  Yeonseung Ryu,et al.  Performance Impact of Resource Conflicts on Chip Multi-processor Servers , 2006, PARA.

[22]  Kevin Skadron,et al.  Scaling with Design Constraints: Predicting the Future of Big Chips , 2011, IEEE Micro.

[23]  Robert Love,et al.  Linux Kernel Development , 2003 .

[24]  Ravi Kothari,et al.  Identifying sources of Operating System Jitter through fine-grained kernel instrumentation , 2007, 2007 IEEE International Conference on Cluster Computing.

[25]  Shyamkumar Thoziyoor,et al.  CACTI 5 . 1 , 2008 .

[26]  Mateo Valero,et al.  Designing OS for HPC Applications: Scheduling , 2010, 2010 IEEE International Conference on Cluster Computing.

[27]  J. Fier,et al.  Improving the Scalability of Parallel Jobs by adding Parallel Awareness to the Operating System , 2003, ACM/IEEE SC 2003 Conference (SC'03).

[28]  Michel Dubois,et al.  Dynamic MIPS rate stabilization in out-of-order processors , 2009, ISCA '09.

[29]  Brad Calder,et al.  Phase tracking and prediction , 2003, ISCA '03.

[30]  Jonathan Walpole,et al.  A measurement-based analysis of the real-time performance of linux , 2002, Proceedings. Eighth IEEE Real-Time and Embedded Technology and Applications Symposium.

[31]  Ron Brightwell,et al.  Characterizing application sensitivity to OS interference using kernel-level noise injection , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.

[32]  Pradipta De,et al.  jitSim: A Simulator for Predicting Scalability of Parallel Applications in Presence of OS Jitter , 2010, Euro-Par.

[33]  Josep Torrellas,et al.  EVAL: Utilizing processors with variation-induced timing errors , 2008, 2008 41st IEEE/ACM International Symposium on Microarchitecture.