论文信息 - Cache restoration for highly partitioned virtualized systems

Cache restoration for highly partitioned virtualized systems

The economics of server consolidation have led to the support of virtualization features in almost all server-class systems, with the related feature set being a subject of significant competition. While most systems allow for partitioning at the relatively coarse grain of a single core, some systems also support multiprogrammed virtualization, whereby a system can be more finely partitioned through time-sharing, down to a percentage of a core being allotted to a virtual machine. When multiple virtual machines share a single core however, performance can suffer due to the displacement of microarchitectural state. We introduce cache restoration, a hardware-based prefetching mechanism initiated by the underlying virtualization software when a virtual machine is being scheduled on a core, prefetching its working set and warming its initial environment. Through cycle-accurate simulation of a POWER7 system, we show that when applied to its private per-core L3 last-level cache, the warm cache translates into 20% on average performance improvement for a mixture of workloads on a highly partitioned core, compared to a virtualized server without cache restoration.

Harold W. Cain | David Daly

[1] Balaram Sinharoy,et al. IBM POWER7 multicore server processor , 2011 .

[2] Dean M. Tullsen,et al. Fast thread migration via cache working set prediction , 2011, 2011 IEEE 17th International Symposium on High Performance Computer Architecture.

[3] Fang Liu,et al. Characterizing and modeling the behavior of context switch misses! , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[4] 工藤真臣,et al. VMware vSphere 4 , 2009 .

[5] Guillaume Urvoy-Keller,et al. Networking in a virtualized environment: The TCP case , 2013, 2013 IEEE 2nd International Conference on Cloud Networking (CloudNet).

[6] Mark Horowitz,et al. Cache performance of operating system and multiprogramming workloads , 1988, TOCS.

[7] Jeffrey C. Mogul,et al. The effect of context switches on cache performance , 1991, ASPLOS IV.

[8] Suleyman Sair,et al. Extending data prefetching to cope with context switch misses , 2009, 2009 IEEE International Conference on Computer Design.

[9] Babak Falsafi,et al. Predictor virtualization , 2008, ASPLOS.

[10] Lixin Zhang,et al. Mambo: a full system simulator for the PowerPC architecture , 2004, PERV.

[11] M. Lipasti,et al. Opportunities for Cache Friendly Process Scheduling , 2005 .

[12] Harold S. Stone,et al. Footprints in the cache , 1986, SIGMETRICS '86/PERFORMANCE '86.

[13] Thomas F. Wenisch,et al. Practical off-chip meta-data for temporal memory streaming , 2009, 2009 IEEE 15th International Symposium on High Performance Computer Architecture.

[14] Brian Fahs,et al. Microarchitecture optimizations for exploiting memory-level parallelism , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..

[15] Onur Mutlu,et al. Coordinated control of multiple prefetchers in multi-core systems , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).