Effective Management of DRAM Bandwidth in Multicore Processors

Technology trends are leading to increasing number of cores on chip. All these cores inherently share the DRAM bandwidth. The on-chip cache resources are limited and in many situations, cannot hold the working set of the threads running on all these cores. This situation makes DRAM bandwidth a critical shared resource. Existing DRAM bandwidth management schemes provide support for enforcing bandwidth shares but have problems like starvation, complexity, and unpredictable DRAM access latency. In this paper, we propose a DRAM bandwidth management scheme with two key features. First, the scheme avoids unexpected long latencies or starvation of memory requests. It also allows OS to select the right combination of performance and strength of bandwidth share enforcement. Second, it provides a feedback-driven policy that adoptively tunes the bandwidth shares to achieve desired average latencies for memory accesses. This feature is useful under high contention and can be used to provide performance level support for critical applications or to support service level agreements for enterprise computing data centers.

[1]  Eli Gafni,et al.  Dynamic Control of Session Input Rates in Communication Networks , 1982, MILCOM 1982 - IEEE Military Communications Conference - Progress in Spread Spectrum Communications.

[2]  Scott Shenker,et al.  Analysis and simulation of a fair queueing algorithm , 1989, SIGCOMM '89.

[3]  Lixia Zhang VirtualClock: a new traffic control algorithm for packet-switched networks , 1991, TOCS.

[4]  Abhay Parekh,et al.  A generalized processor sharing approach to flow control in integrated services networks-the single node case , 1992, [Proceedings] IEEE INFOCOM '92: The Conference on Computer Communications.

[5]  Abhay Parekh,et al.  A generalized processor sharing approach to flow control in integrated services networks: the single-node case , 1993, TNET.

[6]  S. Jamaloddin Golestani,et al.  A self-clocked fair queueing scheme for broadband applications , 1994, Proceedings of INFOCOM '94 Conference on Computer Communications.

[7]  Hui Zhang,et al.  Service disciplines for guaranteed performance service in packet-switching networks , 1995, Proc. IEEE.

[8]  D. Burger,et al.  Memory Bandwidth Limitations of Future Microprocessors , 1996, 23rd Annual International Symposium on Computer Architecture (ISCA'96).

[9]  QueueingJon,et al.  WF 2 Q : Worst-case Fair Weighted Fair , 1996 .

[10]  Sally A. McKee,et al.  Design and evaluation of dynamic access ordering hardware , 1996, ICS '96.

[11]  Hui Zhang,et al.  WF/sup 2/Q: worst-case fair weighted fair queueing , 1996, Proceedings of IEEE INFOCOM '96. Conference on Computer Communications.

[12]  H. Vin,et al.  Start-time fair queueing: a scheduling algorithm for integrated services packet switching networks , 1996, SIGCOMM '96.

[13]  Chienhua Chen,et al.  Service disciplines for guaranteed performance service , 1997, Proceedings Fourth International Workshop on Real-Time Computing Systems and Applications.

[14]  William J. Dally,et al.  Memory access scheduling , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[15]  Norman P. Jouppi,et al.  Cacti 3. 0: an integrated cache timing, power, and area model , 2001 .

[16]  Fredrik Larsson,et al.  Simics: A Full System Simulation Platform , 2002, Computer.

[17]  Microsystems Sun UltraSPARC IV Processor Architecture Overview , 2004 .

[18]  S. Kim,et al.  Fair cache sharing and partitioning in a chip multiprocessor architecture , 2004, Proceedings. 13th International Conference on Parallel Architecture and Compilation Techniques, 2004. PACT 2004..

[19]  Calvin Lin,et al.  Adaptive History-Based Memory Schedulers , 2004, 37th International Symposium on Microarchitecture (MICRO-37'04).

[20]  Scott Rixner,et al.  Memory Controller Optimizations for Web Servers , 2004, 37th International Symposium on Microarchitecture (MICRO-37'04).

[21]  G. Edward Suh,et al.  Dynamic Partitioning of Shared Cache Memory , 2004, The Journal of Supercomputing.

[22]  Faye A. Briggs,et al.  A study of performance impact of memory controller features in multi-processor server environment , 2004, WMPI '04.

[23]  Balaram Sinharoy,et al.  IBM Power5 chip: a dual-core multithreaded processor , 2004, IEEE Micro.

[24]  Xiaoyun Zhu,et al.  An adaptive optimal controller for non-intrusive performance differentiation in computing services , 2005, 2005 International Conference on Control and Automation.

[25]  Milo M. K. Martin,et al.  Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset , 2005, CARN.

[26]  Kunle Olukotun,et al.  Niagara: a 32-way multithreaded Sparc processor , 2005, IEEE Micro.

[27]  Jeffrey S. Chase,et al.  Controllable fair queuing for meeting performance goals , 2005, Perform. Evaluation.

[28]  Aamer Jaleel,et al.  DRAMsim: a memory system simulator , 2005, CARN.

[29]  Zhao Zhang,et al.  A performance comparison of DRAM memory system optimizations for SMT processors , 2005, 11th International Symposium on High-Performance Computer Architecture.

[30]  Chein-Wei Jen,et al.  An efficient quality-aware memory controller for multimedia platform SoC , 2005, IEEE Transactions on Circuits and Systems for Video Technology.

[31]  Yixin Diao,et al.  Control of fair queueing: modeling, implementation, and experiences , 2005, 2005 9th IFIP/IEEE International Symposium on Integrated Network Management, 2005. IM 2005..

[32]  James E. Smith,et al.  Fair Queuing Memory Systems , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[33]  Won-Taek Lim,et al.  Architectural support for operating system-driven CMP cache management , 2006, 2006 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[34]  Samuel Williams,et al.  The Landscape of Parallel Computing Research: A View from Berkeley , 2006 .

[35]  Onur Mutlu,et al.  Memory Performance Attacks: Denial of Memory Service in Multi-Core Systems , 2007, USENIX Security Symposium.

[36]  Manolis Katevenis,et al.  Pipelined Heap (Priority Queue) Management for Advanced Scheduling in High-Speed Networks , 2007, IEEE/ACM Transactions on Networking.

[37]  Yan Solihin,et al.  QoS policies and architecture for cache/memory in CMP platforms , 2007, SIGMETRICS '07.