Reducing Response Time with Preheated Caches

CPU performance is increasingly limited by thermal dissipation, and soon aggressive power management will be beneficial for performance. Especially, temporarily idle parts of the chip (including the caches) should be power-gated in order to reduce leakage power. Current CPUs already lose their cache state whenever the CPU is idle for extended periods of time, which causes a performance loss when execution is resumed, due to the high number of cache misses when the working set is fetched from external memory. In a server system, the first network request during this period suffers from increased response time. We present a technique to reduce this overhead by preheating the caches in advance before the network request arrives at the server: Our design predicts the working set of the server application by analyzing the cache contents after similar requests have been processed. As soon as an estimate of the working set is available, a predictable network architecture starts to announce future incoming network packets to the server, which then loads the predicted working set into the cache. Our experiments show that, if this preheating step is complete when the network packet arrives, the response time overhead is reduced by an average of 80%.

[1]  Preeti Ranjan Panda,et al.  Cache aware compression for processor debug support , 2009, 2009 Design, Automation & Test in Europe Conference & Exhibition.

[2]  Kaushik Roy,et al.  Gated-Vdd: a circuit technique to reduce leakage in deep-submicron cache memories , 2000, ISLPED '00.

[3]  David Blaauw,et al.  Drowsy caches: simple techniques for reducing leakage power , 2002, ISCA.

[4]  Liang Zhang,et al.  Picocenter: supporting long-lived, mostly-idle applications in cloud environments , 2016, EuroSys.

[5]  Kartik Gopalan,et al.  Post-copy based live virtual machine migration using adaptive pre-paging and dynamic self-ballooning , 2009, VEE '09.

[6]  Qi Zhu,et al.  Software Engagement with Sleeping CPUs , 2015, HotOS.

[7]  Luiz André Barroso,et al.  The tail at scale , 2013, CACM.

[8]  Jack J. Purdum,et al.  C programming guide , 1983 .

[9]  Amin Vahdat,et al.  Practical TDMA for datacenter ethernet , 2012, EuroSys '12.

[10]  Michael Werner,et al.  Wake-up latencies for processor idle states on current x86 processors , 2014, Computer Science - Research and Development.

[11]  Mahmut T. Kandemir,et al.  Leakage Current: Moore's Law Meets Static Power , 2003, Computer.

[12]  Babak Falsafi,et al.  Toward Dark Silicon in Servers , 2011, IEEE Micro.

[13]  Devavrat Shah,et al.  Fastpass , 2014, SIGCOMM.

[14]  Andrew W. Moore,et al.  R2D2: bufferless, switchless data center networks using commodity ethernet hardware , 2013, SIGCOMM.

[15]  Marios C. Papaefthymiou,et al.  Computational sprinting , 2012, IEEE International Symposium on High-Performance Comp Architecture.

[16]  Eliezer Yudkowsky,et al.  Fine-Grained Estimation of Memory Bandwidth Utilization , 2016 .