Reducing Migration-induced Misses in an over-Subscribed Multiprocessor System

In a large multiprocessor server platform using multicore chips, the scheduler often migrates a thread or process, in order to achieve better load balancing or ensure fairness among competing scheduling entities. Each migration incurs a severe performance impact from the loss of cache and Translation Lookaside Buffer (TLB) footprints and subsequent higher cache misses and page walks. Such impact is likely to be more severe in virtualized environments, where high over-subscription of CPUs is very common for server consolidation workloads or virtual desktop infrastructure deployment, causing frequent migrations and context switches. We demonstrate the performance benefit of preserving a portion of L2 cache—in particular, MRU cache lines—and warming the destination L2 cache by prefetching those cache lines under different migration scenarios. We observed a 1.5-27% reduction in CPI (cycles per instruction) following a migration. We also study the effectiveness of preserving TLB entries over a context switch or migration.

[1]  David J. Lilja,et al.  Data prefetch mechanisms , 2000, CSUR.

[2]  Fang Liu,et al.  Characterizing and modeling the behavior of context switch misses! , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[3]  Frank Mueller,et al.  Push-assisted migration of real-time tasks in multi-core processors , 2009, LCTES '09.

[4]  Fredrik Larsson,et al.  Simics: A Full System Simulation Platform , 2002, Computer.

[5]  Roy H. Campbell,et al.  Context switch overheads for Linux on ARM platforms , 2007, ExpCS '07.

[6]  Yiannakis Sazeides,et al.  Performance implications of single thread migration on a chip multi-core , 2005, CARN.

[7]  Milo M. K. Martin,et al.  Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset , 2005, CARN.

[8]  Chen Ding,et al.  Quantifying the cost of context switch , 2007, ExpCS '07.

[9]  Suleyman Sair,et al.  Extending data prefetching to cope with context switch misses , 2009, 2009 IEEE International Conference on Computer Design.

[10]  James E. Smith,et al.  Saving and Restoring Implementation Contexts with co-Designed Virtual Machines , 2001 .

[11]  Dean M. Tullsen,et al.  Fast thread migration via cache working set prediction , 2011, 2011 IEEE 17th International Symposium on High Performance Computer Architecture.

[12]  S. Pasricha,et al.  Novel Techniques to Improve Branch Prediction Accuracy for Embedded Processors in the Presence of Context Switches , 2003 .

[13]  Onur Mutlu,et al.  Feedback Directed Prefetching: Improving the Performance and Bandwidth-Efficiency of Hardware Prefetchers , 2007, 2007 IEEE 13th International Symposium on High Performance Computer Architecture.

[14]  Joonwon Lee,et al.  OPTS: increasing branch prediction accuracy under context switch , 2002, Microprocess. Microsystems.

[15]  Jeffrey C. Mogul,et al.  The effect of context switches on cache performance , 1991, ASPLOS IV.

[16]  Jarmo Takala,et al.  Reducing Context Switch Overhead with Compiler-Assisted Threading , 2008, 2008 IEEE/IFIP International Conference on Embedded and Ubiquitous Computing.

[17]  Jeffrey Casazza,et al.  Redefining Server Performance Characterization for Virtualization Benchmarking , 2006 .

[18]  Gregory T. Byrd,et al.  Reducing Migration-induced Cache Misses , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum.