Characterization and modeling of multicast communication in cache-coherent manycore processors

Multicast traffic is characterized and modeled with an emphasis on scalability.Intensity, concentration and burstiness increase with the system size.Growing correlation suggests the use of prediction to optimize NoC designs.Simple multicast source predictors achieve modest but promising accuracies. Display Omitted The scalability of Network-on-Chip (NoC) designs has become a rising concern as we enter the manycore era. Multicast support represents a particular yet relevant case within this context, mainly due to the poor performance of NoCs in the presence of this type of traffic. Multicast techniques are typically evaluated using synthetic traffic or within a full system, which is either simplistic or costly, given the lack of realistic traffic models that distinguish between unicast and multicast flows. To bridge this gap, this paper presents a trace-based multicast traffic characterization, which explores the scaling trends of aspects such as the multicast intensity or the spatiotemporal injection distribution for different coherence schemes. This analysis is the basis upon which the concept of multicast source prediction is proposed, and upon which a multicast traffic model is built. Both aspects pave the way for the development and accurate evaluation of advanced NoCs in the context of manycore computing.

[1]  Avinoam Kolodny,et al.  Handling global traffic in future CMP NoCs , 2012 .

[2]  Eduard Alarcón,et al.  Multicast On-chip Traffic Analysis Targeting Manycore NoC Design , 2015, 2015 23rd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing.

[3]  Radu Marculescu,et al.  Prediction-based flow control for network-on-chip traffic , 2006, 2006 43rd ACM/IEEE Design Automation Conference.

[4]  Anoop Gupta,et al.  The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.

[5]  Li-Shiuan Peh,et al.  A Statistical Traffic Model for On-Chip Interconnection Networks , 2006, 14th IEEE International Symposium on Modeling, Analysis, and Simulation.

[6]  Mario Badr,et al.  SynFull: Synthetic traffic models capturing cache coherent behaviour , 2014, 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA).

[7]  Brad Calder,et al.  Phase tracking and prediction , 2003, ISCA '03.

[8]  Simon W. Moore,et al.  A communication characterisation of Splash-2 and Parsec , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).

[9]  Li-Shiuan Peh,et al.  Towards the ideal on-chip fabric for 1-to-many and many-to-1 communication , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[10]  Hideharu Amano,et al.  Prediction router: Yet another low latency on-chip router architecture , 2009, 2009 IEEE 15th International Symposium on High Performance Computer Architecture.

[11]  Pat Conway,et al.  The AMD Opteron Northbridge Architecture , 2007, IEEE Micro.

[12]  George Kurian,et al.  Graphite: A distributed parallel simulator for multicores , 2010, HPCA - 16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture.

[13]  Hannu Tenhunen,et al.  Path-Based Partitioning Methods for 3D Networks-on-Chip with Minimal Adaptive Routing , 2014, IEEE Transactions on Computers.

[14]  Walter Willinger,et al.  On the self-similar nature of Ethernet traffic , 1993, SIGCOMM '93.

[15]  Leonid Oliker,et al.  Analyzing Ultra-Scale Application Communication Requirements for a Reconfigurable Hybrid Interconnect , 2005, ACM/IEEE SC 2005 Conference (SC'05).

[16]  Kiyoung Choi,et al.  Exploiting New Interconnect Technologies in On-Chip Communication , 2012, IEEE Journal on Emerging and Selected Topics in Circuits and Systems.

[17]  Li Zhou,et al.  PROBE: Prediction-based optical bandwidth scaling for energy-efficient NoCs , 2013, 2013 Seventh IEEE/ACM International Symposium on Networks-on-Chip (NoCS).

[18]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[19]  Mauricio Hanzich,et al.  Broadcast-Enabled Massive Multicore Architectures: A Wireless RF Approach , 2015, IEEE Micro.

[20]  Jeffrey S. Vetter,et al.  An Empirical Performance Evaluation of Scalable Scientific Applications , 2002, ACM/IEEE SC 2002 Conference (SC'02).

[21]  Somayeh Sardashti,et al.  The gem5 simulator , 2011, CARN.

[22]  Remzi H. Arpaci-Dusseau,et al.  Architectural Requirements and Scalability of the NAS Parallel Benchmarks , 1999, ACM/IEEE SC 1999 Conference (SC'99).

[23]  Kai Li,et al.  The PARSEC benchmark suite: Characterization and architectural implications , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).