Machine Learned Machines: Adaptive co-optimization of caches, cores, and On-chip Network

Modern multicore architectures require runtime optimization techniques to address the problem of mismatches between the dynamic resource requirements of different processes and the runtime allocation. Choosing between multiple optimizations at runtime is complex due to the non-additive effects, making the adaptiveness of the machine learning techniques useful. We present a novel method, Machine Learned Machines (MLM), by using Online Reinforcement Learning (RL) to perform dynamic partitioning of the last level cache (LLC), along with dynamic voltage and frequency scaling (DVFS) of the core and uncore (interconnection network and LLC). We show that the co-optimization results in much lower energy-delay product (EDP) than any of the techniques applied individually. The results show an average of 19.6% EDP and 2.6% execution time improvement over the baseline.

[1]  W. Marsden I and J , 2012 .

[2]  Narayanan Vijaykrishnan,et al.  Run-time adaption for highly-complex multi-core systems , 2013, 2013 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS).

[3]  Frank Vahid,et al.  A Self-Tuning Configurable Cache , 2007, 2007 44th ACM/IEEE Design Automation Conference.

[4]  Tajana Simunic,et al.  Dynamic voltage frequency scaling for multi-tasking systems using online learning , 2007, Proceedings of the 2007 international symposium on Low power electronics and design (ISLPED '07).

[5]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[6]  Diana Marculescu,et al.  Power-aware performance increase via core/uncore reinforcement control for chip-multiprocessors , 2012, ISLPED '12.

[7]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[8]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[9]  Chenjie Yu,et al.  Off-chip memory bandwidth minimization through cache partitioning for multi-core platforms , 2010, Design Automation Conference.

[10]  Stefanos Kaxiras,et al.  Power-performance adaptation in Intel core i7 , 2011 .

[11]  Yale N. Patt,et al.  Utility-Based Cache Partitioning: A Low-Overhead, High-Performance, Runtime Mechanism to Partition Shared Caches , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[12]  Lieven Eeckhout,et al.  Sniper: Exploring the level of abstraction for scalable and accurate parallel multi-core simulation , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[13]  Jung Ho Ahn,et al.  McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[14]  Michael Glaß,et al.  Multi-objective distributed run-time resource management for many-cores , 2014, 2014 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[15]  Massoud Pedram,et al.  Deriving a near-optimal power management policy using model-free reinforcement learning and Bayesian classification , 2011, 2011 48th ACM/EDAC/IEEE Design Automation Conference (DAC).