Toward More Efficient NoC Arbitration : A Deep Reinforcement Learning Approach

The network on-chip (NoC) is a critical resource shared by various on-chip components. An efficient NoC arbitration policy is crucial in providing global fairness and improving system performance. In this preliminary work, we demonstrate an idea of utilizing deep reinforcement learning to guide the design of more efficient NoC arbitration policies. We relate arbitration to a self-learning decision making process. Results show that the deep reinforcement learning approach can effectively reduce packet latency and has potential for identifying interesting features that could be utilized in more practical hardware designs.

[1]  Nick McKeown,et al.  The iSLIP scheduling algorithm for input-queued switches , 1999, TNET.

[2]  Deborah K. Weisser,et al.  Age-based packet arbitration in large-radix k-ary n-cubes , 2007, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07).

[3]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[4]  Lixia Zhang,et al.  Virtual Clock: A New Traffic Control Algorithm for Packet Switching Networks , 1990, SIGCOMM.

[5]  Matthew Poremba,et al.  There and back again: Optimizing the interconnect in networks of memory cubes , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).

[6]  John Kim,et al.  Probabilistic Distance-Based Arbitration: Providing Equality of Service for Many-Core CMPs , 2010, 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture.

[7]  Niraj K. Jha,et al.  GARNET: A detailed on-chip network model inside a full-system simulator , 2009, 2009 IEEE International Symposium on Performance Analysis of Systems and Software.

[8]  Christoforos E. Kozyrakis,et al.  Learning Memory Access Patterns , 2018, ICML.

[9]  Daniel A. Jiménez,et al.  Dynamic branch prediction with perceptrons , 2001, Proceedings HPCA Seventh International Symposium on High-Performance Computer Architecture.

[10]  Chita R. Das,et al.  Aérgia: exploiting packet latency slack in on-chip networks , 2010, ISCA.

[11]  Xiaolei Guo,et al.  A fast arbitration scheme for terabit packet switches , 1999, Seamless Interconnection for Universal Services. Global Telecommunications Conference. GLOBECOM'99. (Cat. No.99CH37042).

[12]  Onur Mutlu,et al.  Self-Optimizing Memory Controllers: A Reinforcement Learning Approach , 2008, 2008 International Symposium on Computer Architecture.

[13]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[14]  Zhe Wang,et al.  Perceptron learning for reuse prediction , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[15]  Scott Shenker,et al.  Analysis and simulation of a fair queueing algorithm , 1989, SIGCOMM 1989.

[16]  Long Ji Lin,et al.  Self-improving reactive agents based on reinforcement learning, planning and teaching , 1992, Machine Learning.

[17]  Yuan Zeng,et al.  Long short term memory based hardware prefetcher: a case study , 2017, MEMSYS.