Electro-Photonic NoC Designs for Kilocore Systems

The increasing core count in manycore systems requires a corresponding large Network-on-chip (NoC) bandwidth to support the overlying applications. However, it is not possible to provide this large bandwidth in an energy-efficient manner using electrical link technology. To overcome this issue, photonic link technology has been proposed as a replacement. This work explores the limits and opportunities for using photonic links to design the NoC architecture for a future Kilocore system. Three different NoC designs are explored: ElecNoC, an electrical concentrated two-dimensional- (2D) mesh NoC; HybNoC, an electrical concentrated 2D mesh with a photonic multi-crossbar NoC; and PhotoNoC, a photonic multi-bus NoC. We consider both private and shared cache architectures and, to leverage the large bandwidth density of photonic links, we investigate the use of prefetching and aggressive non-blocking caches. Our analysis using contemporary Big Data workloads shows that the non-blocking caches with a shared LLC can best leverage the large bandwidth of the photonic links in the Kilocore system. Moreover, compared to ElecNoC-based and HybNoC-based Kilocore systems, a PhotoNoC-based Kilocore system achieves up to 2.5× and 1.5× better performance, respectively, and can support up to 2.1× and 1.1× higher bandwidth, respectively, while dissipating comparable power in the overall system.

[1]  Alyssa B. Apsel,et al.  Leveraging Optical Technology in Future Bus-based Chip Multiprocessors , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[2]  Yu Zhang,et al.  Firefly: illuminating future network-on-chip with nanophotonics , 2009, ISCA '09.

[3]  Jie Sun,et al.  Open Foundry Platform for High-performance Electronic-photonic Integration References and Links , 2022 .

[4]  Ulf Schlichtmann,et al.  PROTON: An automatic place-and-route tool for optical Networks-on-Chip , 2013, 2013 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[5]  Kai Li,et al.  The PARSEC benchmark suite: Characterization and architectural implications , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[6]  Aravind Srinivasan,et al.  A monolithically-integrated optical transmitter and receiver in a zero-change 45nm SOI process , 2014, 2014 Symposium on VLSI Circuits Digest of Technical Papers.

[7]  Sharad Malik,et al.  Power-driven Design of Router Microarchitectures in On-chip Networks , 2003, MICRO.

[8]  Brian W. Barrett,et al.  Introducing the Graph 500 , 2010 .

[9]  Ajay Joshi,et al.  Sharing and placement of on-chip laser sources in silicon-photonic NoCs , 2014, 2014 Eighth IEEE/ACM International Symposium on Networks-on-Chip (NoCS).

[10]  Pedro López,et al.  Assessing fat-tree topologies for regular network-on-chip design under nanoscale technology constraints , 2009, 2009 Design, Automation & Test in Europe Conference & Exhibition.

[11]  David Blaauw,et al.  Scaling towards kilo-core processors with asymmetric high-radix topologies , 2013, 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA).

[12]  R. Ho,et al.  2-pJ/bit (On-Chip) 10-Gb/s Digital CMOS Silicon Photonic Link , 2012, IEEE Photonics Technology Letters.

[13]  B. Jalali,et al.  Silicon Photonics , 2006, Journal of Lightwave Technology.

[14]  Li Shang,et al.  Spectrum: A hybrid nanophotonic—electric on-chip network , 2009, 2009 46th ACM/IEEE Design Automation Conference.

[15]  B. Jalali,et al.  Silicon photonics , 2006, IEEE Microwave Magazine.

[16]  David A. Patterson,et al.  Computer Organization and Design, Fifth Edition: The Hardware/Software Interface , 2013 .

[17]  Vladimir Stojanovic,et al.  Designing Energy-Efficient Low-Diameter On-Chip Networks with Equalized Interconnects , 2009, 2009 17th IEEE Symposium on High Performance Interconnects.

[18]  Luca P. Carloni,et al.  The Case for Low-Power Photonic Networks on Chip , 2007, 2007 44th ACM/IEEE Design Automation Conference.

[19]  Yuqing Zhu,et al.  BigDataBench: A big data benchmark suite from internet services , 2014, 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA).

[20]  Guang R. Gao,et al.  A study of the on-chip interconnection network for the IBM Cyclops64 multi-core architecture , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.

[21]  David H. Bailey,et al.  The Nas Parallel Benchmarks , 1991, Int. J. High Perform. Comput. Appl..

[22]  Sudeep Pasricha,et al.  METEOR: Hybrid photonic ring-mesh network-on-chip for multicore architectures , 2014, ACM Trans. Embed. Comput. Syst..

[23]  Dean M. Tullsen,et al.  Interconnections in multi-core architectures: understanding mechanisms, overheads and scaling , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).

[24]  Dean M. Tullsen,et al.  Effective cache prefetching on bus-based multiprocessors , 1995, TOCS.

[25]  Xiaoge Zeng,et al.  Ultra-low-loss CMOS-compatible waveguide crossing arrays based on multimode Bloch waves and imaginary coupling. , 2013, Optics letters.

[26]  Steven M. Nowick,et al.  ACM Journal on Emerging Technologies in Computing Systems , 2010, TODE.

[27]  Erik Brynjolfsson,et al.  Big data: the management revolution. , 2012, Harvard business review.

[28]  Luca P. Carloni,et al.  On the Design of a Photonic Network-on-Chip , 2007, First International Symposium on Networks-on-Chip (NOCS'07).

[29]  Lieven Eeckhout,et al.  Sniper: Exploring the level of abstraction for scalable and accurate parallel multi-core simulation , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[30]  Avinash Karanth Kodi,et al.  Dynamic Power Reduction Techniques in On-Chip Photonic Interconnects , 2015, ACM Great Lakes Symposium on VLSI.

[31]  William J. Dally,et al.  Flattened Butterfly Topology for On-Chip Networks , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[32]  David Wentzlaff,et al.  Processor: A 64-Core SoC with Mesh Interconnect , 2010 .

[33]  James E. Smith,et al.  Data Cache Prefetching Using a Global History Buffer , 2004, 10th International Symposium on High Performance Computer Architecture (HPCA'04).

[34]  Xin Fu,et al.  Aurora: A Cross-Layer Solution for Thermally Resilient Photonic Network-on-Chip , 2015, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[35]  Xuezhe Zheng,et al.  A 33mW 100Gbps CMOS silicon photonic WDM transmitter using off-chip laser sources , 2013, 2013 Optical Fiber Communication Conference and Exposition and the National Fiber Optic Engineers Conference (OFC/NFOEC).

[36]  David H. Albonesi,et al.  Phastlane: a rapid transit optical routing network , 2009, ISCA '09.

[37]  Jung Ho Ahn,et al.  McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[38]  Anoop Gupta,et al.  The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.

[39]  David M. Brooks,et al.  Architectural power models for sram and cam structures based on hybrid analytical/empirical techniques , 2007, 2007 IEEE/ACM International Conference on Computer-Aided Design.

[40]  Jung Ho Ahn,et al.  Corona: System Implications of Emerging Nanophotonic Technology , 2008, 2008 International Symposium on Computer Architecture.

[41]  Carl Ramey,et al.  TILE-Gx100 ManyCore processor: Acceleration interfaces and architecture , 2011, 2011 IEEE Hot Chips 23 Symposium (HCS).

[42]  Kunle Olukotun,et al.  Niagara: a 32-way multithreaded Sparc processor , 2005, IEEE Micro.

[43]  Anantha Chandrakasan,et al.  SCORPIO: A 36-core research chip demonstrating snoopy coherence on a scalable mesh NoC with in-network ordering , 2014, 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA).

[44]  Ashok V. Krishnamoorthy,et al.  10-Gbps, 5.3-mW Optical Transmitter and Receiver Circuits in 40-nm CMOS , 2012, IEEE Journal of Solid-State Circuits.

[45]  Timothy Mark Pinkston,et al.  Characterizing the Cell EIB On-Chip Network , 2007, IEEE Micro.

[46]  Christopher Batten,et al.  Re-architecting DRAM memory systems with monolithically integrated silicon photonics , 2010, ISCA.

[47]  David Kroft,et al.  Lockup-free instruction fetch/prefetch cache organization , 1998, ISCA '81.

[48]  John Kim,et al.  FeatherWeight: Low-cost optical arbitration with QoS support , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[49]  Christopher Batten,et al.  Silicon-photonic clos networks for global on-chip communication , 2009, 2009 3rd ACM/IEEE International Symposium on Networks-on-Chip.

[50]  Edward T. Grochowski,et al.  Larrabee: A many-Core x86 architecture for visual computing , 2008, 2008 IEEE Hot Chips 20 Symposium (HCS).

[51]  George Kurian,et al.  ATAC: A 1000-core cache-coherent processor with on-chip optical network , 2010, 2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT).

[52]  William J. Dally,et al.  The BlackWidow High-Radix Clos Network , 2006, 33rd International Symposium on Computer Architecture (ISCA'06).

[53]  Di Liang,et al.  Thermal Management of Hybrid Silicon Ring Lasers for High Temperature Operation , 2015, IEEE Journal of Selected Topics in Quantum Electronics.

[54]  Sandia Report,et al.  Improving Performance via Mini-applications , 2009 .

[55]  Chao Chen,et al.  Runtime Management of Laser Power in Silicon-Photonic Multibus NoC Architecture , 2013, IEEE Journal of Selected Topics in Quantum Electronics.

[56]  Ahmed Louri,et al.  OWN: Optical and Wireless Network-on-Chip for Kilo-core Architectures , 2015, 2015 IEEE 23rd Annual Symposium on High-Performance Interconnects.

[57]  Matthias Gries,et al.  SCC: A Flexible Architecture for Many-Core Platform Research , 2011, Computing in Science & Engineering.

[58]  Gunther Roelkens,et al.  Silicon-Based Photonic Integration Beyond the Telecommunication Wavelength Range , 2014, IEEE Journal of Selected Topics in Quantum Electronics.

[59]  José L. Abellán,et al.  Managing Laser Power in Silicon-Photonic NoC Through Cache and NoC Reconfiguration , 2015, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[60]  Rajeev Balasubramonian,et al.  Towards scalable, energy-efficient, bus-based on-chip networks , 2010, HPCA - 16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture.

[61]  Saurabh Dighe,et al.  A 48-Core IA-32 Processor in 45 nm CMOS Using On-Die Message-Passing and DVFS for Performance and Power Scaling , 2011, IEEE Journal of Solid-State Circuits.