A Generic Implementation of Barriers Using Optical Interconnects

Barriers have long been recognized as important performance-critical constructs in parallel applications. As a consequence, researchers have proposed fast implementations of barriers in both traditional electrical networks and in non-conventional networks such as optical NoCs. We prove in this paper that current protocols for barriers in optical NoCs are simplistic and cannot be trivially extended to accommodate for normal events that arise in regular operation such as presence of multiple applications, context switches, thread migrations, and variability in the number of active threads. We propose two generic protocols for barriers that can take all such cases into account, are fast, and try to minimize the number of messages sent over the NoC. One of these protocols is a centralized protocol (suitable for less cores), and the other is a distributed protocol, which is scalable. For a suite of standard benchmarks we found the latter to yield a mean speedup of 30.77% over a design that uses a hardware tree barrier. Our barrier implementation per se is roughly 2X and 20X faster than prior implementations that use transmission lines and electrical links respectively.

[1]  Norman P. Jouppi,et al.  Exploiting Fine-Grained Data Parallelism with Chip Multiprocessors and Fast Barriers , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[2]  Davide Bertozzi,et al.  A complete electronic network interface architecture for global contention-free communication over emerging optical networks-on-chip , 2014, GLSVLSI '14.

[3]  Leonie Kohl,et al.  Parallel Programming In C With Mpi And Open Mp , 2016 .

[4]  Eby G. Friedman,et al.  On-chip optical interconnect roadmap: challenges and critical directions , 2005 .

[5]  Ian O'Connor,et al.  Optical solutions for system-level interconnect , 2004, SLIP '04.

[6]  Jung Ho Ahn,et al.  McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[7]  Martin Schoeberl,et al.  Hardware synchronization for embedded multi-core processors , 2011, 2011 IEEE International Symposium of Circuits and Systems (ISCAS).

[8]  Prathmesh Kallurkar,et al.  Tejas: A java based versatile micro-architectural simulator , 2015, 2015 25th International Workshop on Power and Timing Modeling, Optimization and Simulation (PATMOS).

[9]  Smruti R. Sarangi,et al.  ParTejas , 2017, ACM Trans. Model. Comput. Simul..

[10]  Wen-mei W. Hwu,et al.  Parboil: A Revised Benchmark Suite for Scientific and Commercial Throughput Computing , 2012 .

[11]  Milos Prvulovic,et al.  TLSync: Support for multiple fast barriers using on-chip transmission lines , 2011, 2011 38th Annual International Symposium on Computer Architecture (ISCA).

[12]  Yu Zhang,et al.  Firefly: illuminating future network-on-chip with nanophotonics , 2009, ISCA '09.

[13]  John Sartori,et al.  Low-Overhead, High-Speed Multi-core Barrier Synchronization , 2010, HiPEAC.