Analytical Performance Models for NoCs with Multiple Priority Traffic Classes

Networks-on-chip (NoCs) have become the standard for interconnect solutions in industrial designs ranging from client CPUs to many-core chip-multiprocessors. Since NoCs play a vital role in system performance and power consumption, pre-silicon evaluation environments include cycle-accurate NoC simulators. Long simulations increase the execution time of evaluation frameworks, which are already notoriously slow, and prohibit design-space exploration. Existing analytical NoC models, which assume fair arbitration, cannot replace these simulations since industrial NoCs typically employ priority schedulers and multiple priority classes. To address this limitation, we propose a systematic approach to construct priority-aware analytical performance models using micro-architecture specifications and input traffic. Our approach decomposes the given NoC into individual queues with modified service time to enable accurate and scalable latency computations. Specifically, we introduce novel transformations along with an algorithm that iteratively applies these transformations to decompose the queuing system. Experimental evaluations using real architectures and applications show high accuracy of 97% and up to 2.5× speedup in full-system simulation.

[1]  Joris Walraevens Discrete-time queueing models with priorities , 2004 .

[2]  Irfan-Ullah Awan,et al.  Analysis of Discrete-Time Queues with Space and Service Priorities for Arbitrary Arrival Processes , 2005, 11th International Conference on Parallel and Distributed Systems (ICPADS'05).

[3]  Niraj K. Jha,et al.  GARNET: A detailed on-chip network model inside a full-system simulator , 2009, 2009 IEEE International Symposium on Performance Analysis of Systems and Software.

[4]  Axel Jantsch,et al.  An Analytical Latency Model for Networks-on-Chip , 2013, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[5]  Hiren D. Patel,et al.  Bounding buffer space requirements for real-time priority-aware networks , 2014, 2014 19th Asia and South Pacific Design Automation Conference (ASP-DAC).

[6]  Geyong Min,et al.  Modelling and Analysis of Priority Queueing Systems with Multi-Class Self-Similar Network Traffic: A Novel and Efficient Queue-Decomposition Approach , 2009, IEEE Transactions on Communications.

[7]  Somayeh Sardashti,et al.  The gem5 simulator , 2011, CARN.

[8]  Ümit Y. Ogras,et al.  Energy-guided exploration of on-chip network design for exa-scale computing , 2012, SLIP '12.

[9]  Dimitri P. Bertsekas,et al.  Data Networks , 1986 .

[10]  Shunfei Chen,et al.  MARSS: A full system simulator for multicore x86 CPUs , 2011, 2011 48th ACM/EDAC/IEEE Design Automation Conference (DAC).

[11]  Ward Whitt,et al.  Workload bounds in fluid models with priorities , 2000, Perform. Evaluation.

[12]  Partha Pratim Pande,et al.  Performance evaluation and design trade-offs for network-on-chip interconnect architectures , 2005, IEEE Transactions on Computers.

[13]  Alejandro Rico,et al.  ARM HPC Ecosystem and the Reemergence of Vectors: Invited Paper , 2017, Conf. Computing Frontiers.

[14]  Radu Marculescu,et al.  An Analytical Approach for Network-on-Chip Performance Analysis , 2010, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[15]  Fredrik Larsson,et al.  Simics: A Full System Simulation Platform , 2002, Computer.

[16]  Jim Jeffers,et al.  Knights Landing overview , 2016 .

[17]  H. Peter Hofstee,et al.  Introduction to the Cell multiprocessor , 2005, IBM J. Res. Dev..

[18]  Pat Conway,et al.  The AMD Opteron Processor for Multiprocessor Servers , 2003, IEEE Micro.

[19]  Kai Li,et al.  The PARSEC benchmark suite: Characterization and architectural implications , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[20]  Wenhua Dou,et al.  Analysis of worst-case delay bounds for best-effort communication in wormhole networks on chip , 2009, 2009 3rd ACM/IEEE International Symposium on Networks-on-Chip.

[21]  Sudeep Pasricha,et al.  ARTEMIS: An Aging-Aware Runtime Application Mapping Framework for 3D NoC-Based Chip Multiprocessors , 2017, IEEE Trans. Multi Scale Comput. Syst..

[22]  Avinash Sodani,et al.  Intel Xeon Phi Processor High Performance Programming: Knights Landing Edition 2nd Edition , 2016 .

[23]  Partha Pratim Pande,et al.  Performance evaluation of wireless NoCs in presence of irregular network routing strategies , 2014, 2014 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[24]  Myron Hlynka,et al.  Queueing Networks and Markov Chains (Modeling and Performance Evaluation With Computer Science Applications) , 2007, Technometrics.

[25]  Luca Benini,et al.  A virtual platform environment for exploring power, thermal and reliability management control strategies in high-performance multicores , 2010, GLSVLSI '10.

[26]  Arnaldo Carvalho de Melo,et al.  The New Linux ’ perf ’ Tools , 2010 .

[27]  Nan Jiang,et al.  A detailed and flexible cycle-accurate Network-on-Chip simulator , 2013, 2013 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).

[28]  Mohamed Ould-Khaoua,et al.  Analytical modelling of networks in multicomputer systems under bursty and batch arrival traffic , 2009, The Journal of Supercomputing.

[29]  Radu Marculescu,et al.  A Support Vector Regression (SVR)-Based Latency Model for Network-on-Chip (NoC) Architectures , 2016, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[30]  Radu Marculescu,et al.  On-Chip Communication Network for Efficient Training of Deep Convolutional Networks on Heterogeneous Manycore Systems , 2017, IEEE Transactions on Computers.

[31]  Efraim Rotem,et al.  Inside 6th-Generation Intel Core: New Microarchitecture Code-Named Skylake , 2017, IEEE Micro.

[32]  Rainer Leupers,et al.  Virtual Manycore platforms: Moving towards 100+ processor cores , 2011, 2011 Design, Automation & Test in Europe.

[33]  Radu Marculescu,et al.  Non-Stationary Traffic Analysis and Its Implications on Multicore Platform Design , 2011, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[34]  Mahendra Pratap Singh,et al.  Evolution of Processor Architecture in Mobile Phones , 2014 .