SHARP

As the relentless quest for higher throughput and lower energy cost continues in heterogenous multicores, there is a strong demand for energy-efficient and high-performance Network-on-Chip (NoC) architectures. Heterogeneous architectures that can simultaneously utilize both the serialized nature of the CPU as well as the thread level parallelism of the GPU are gaining traction in the industry. A critical issue with heterogeneous architectures is finding an optimal way to utilize the shared resources such as the last level cache and NoC without hindering the performance of either the CPU or the GPU core. Photonic interconnects are a disruptive technology solution that has the potential to increase the bandwidth, reduce latency, and improve energy-efficiency over traditional metallic interconnects. In this article, we propose a CPU-GPU heterogeneous architecture called Shared Heterogeneous Architecture with Reconfigurable Photonic Network-on-Chip (SHARP) that clusters CPU and GPU cores around the same router and dynamically allocates bandwidth between the CPU and GPU cores based on application demands. The SHARP architecture is designed as a Single-Writer Multiple-Reader (SWMR) crossbar with reservation-assist to connect CPU/GPU cores that dynamically reallocates bandwidth using buffer utilization information at runtime. As network traffic exhibits temporal and spatial fluctuations due to application behavior, SHARP can dynamically reallocate bandwidth and thereby adapt to application demands. SHARP demonstrates 34% performance (throughput) improvement over a baseline electrical CMESH while consuming 25% less energy per bit. Simulation results have also shown 6.9% to 14.9% performance improvement over other flavors of the proposed SHARP architecture without dynamic bandwidth allocation.

[1]  Mahmut T. Kandemir,et al.  Managing GPU Concurrency in Heterogeneous Architectures , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.

[2]  Douglas Coolbaugh,et al.  Demonstration of an optical chip-to-chip link in a 3D integrated electronic-photonic platform , 2015, ESSCIRC Conference 2015 - 41st European Solid-State Circuits Conference (ESSCIRC).

[3]  Sudhakar Yalamanchili,et al.  Adaptive virtual channel partitioning for network-on-chip in heterogeneous architectures , 2013, ACM Trans. Design Autom. Electr. Syst..

[4]  David A. Patterson,et al.  Computer Organization and Design, Fourth Edition, Fourth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) , 2008 .

[5]  Jung Ho Ahn,et al.  Corona: System Implications of Emerging Nanophotonic Technology , 2008, 2008 International Symposium on Computer Architecture.

[6]  Yu Zhang,et al.  Firefly: illuminating future network-on-chip with nanophotonics , 2009, ISCA '09.

[7]  Mark Horowitz,et al.  1.1 Computing's energy problem (and what we can do about it) , 2014, 2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC).

[8]  David R. Kaeli,et al.  Leveraging Silicon-Photonic NoC for Designing Scalable GPUs , 2015, ICS.

[9]  Shaahin Hessabi,et al.  All-Optical Wavelength-Routed Architecture for a Power-Efficient Network on Chip , 2014, IEEE Transactions on Computers.

[10]  David A. Patterson,et al.  Computer Organization And Design: The Hardware/Software Interface , 1993 .

[11]  Venkatesh Akella,et al.  Addressing system-level trimming issues in on-chip nanophotonic networks , 2011, 2011 IEEE 17th International Symposium on High Performance Computer Architecture.

[12]  Alyssa B. Apsel,et al.  Leveraging Optical Technology in Future Bus-based Chip Multiprocessors , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[13]  Jeffrey S. Vetter,et al.  A Survey of CPU-GPU Heterogeneous Computing Techniques , 2015, ACM Comput. Surv..

[14]  John E. Bowers,et al.  Energy Efficient and Energy Proportional Optical Interconnects for Multi-Core Processors: Driving the Need for On-Chip Sources , 2014, IEEE Journal of Selected Topics in Quantum Electronics.

[15]  Christian Bienia,et al.  PARSEC 2.0: A New Benchmark Suite for Chip-Multiprocessors , 2009 .

[16]  Ian O'Connor,et al.  Chameleon: Channel efficient Optical Network-on-Chip , 2014, 2014 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[17]  Zhongliang Chen,et al.  Exploring the heterogeneous design space for both performance and reliability , 2014, 2014 51st ACM/EDAC/IEEE Design Automation Conference (DAC).

[18]  Shekhar Y. Borkar Exascale Computing - A Fact or a Fiction? , 2013, IPDPS.

[19]  Mahmut T. Kandemir,et al.  Exploring heterogeneous NoC design space , 2011, 2011 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[20]  Edoardo Fusella,et al.  H2ONoC: A Hybrid Optical–Electronic NoC Based on Hybrid Topology , 2017, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[21]  Nam Sung Kim,et al.  GPUWattch: enabling energy optimizations in GPGPUs , 2013, ISCA.

[22]  Ahmed Louri,et al.  Dynamic Reconfiguration of 3D Photonic Networks-on-Chip for Maximizing Performance and Improving Fault Tolerance , 2012, 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture.

[23]  David A. Patterson,et al.  Computer Organization and Design, Fifth Edition: The Hardware/Software Interface , 2013 .

[24]  Sriram R. Vangal,et al.  A 2 Tb/s 6$\,\times\,$ 4 Mesh Network for a Single-Chip Cloud Computer With DVFS in 45 nm CMOS , 2011, IEEE Journal of Solid-State Circuits.

[25]  Radu Marculescu,et al.  Hybrid network-on-chip architectures for accelerating deep learning kernels on heterogeneous manycore platforms , 2016, 2016 International Conference on Compliers, Architectures, and Sythesis of Embedded Systems (CASES).

[26]  Jung Ho Ahn,et al.  McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[27]  Anoop Gupta,et al.  The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.

[28]  Amlan Ganguly,et al.  Heterogeneous photonic Network-on-Chip with dynamic bandwidth allocation , 2014, 2014 27th IEEE International System-on-Chip Conference (SOCC).

[29]  S. Borkar,et al.  An 80-Tile Sub-100-W TeraFLOPS Processor in 65-nm CMOS , 2008, IEEE Journal of Solid-State Circuits.

[30]  Sriram R. Vangal,et al.  A 2 Tb/s 6 × 4 Mesh Network for a Single-Chip Cloud Computer With DVFS in 45 nm CMOS , 2011, VLSIC 2011.

[31]  Xiaowen Wu,et al.  SUOR: Sectioned Undirectional Optical Ring for Chip Multiprocessor , 2014, JETC.

[32]  Ketan Patel,et al.  High-power single-mode InGaAsP/InP laser diodes for pulsed operation , 2012, OPTO.

[33]  Chen Sun,et al.  DSENT - A Tool Connecting Emerging Photonics with Electronics for Opto-Electronic Networks-on-Chip Modeling , 2012, 2012 IEEE/ACM Sixth International Symposium on Networks-on-Chip.

[34]  David R. Kaeli,et al.  Multi2Sim: A simulation framework for CPU-GPU computing , 2012, 2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT).

[35]  Sudhakar Yalamanchili,et al.  Design space exploration of on-chip ring interconnection for a CPU-GPU heterogeneous architecture , 2013, J. Parallel Distributed Comput..

[36]  장훈,et al.  [서평]「Computer Organization and Design, The Hardware/Software Interface」 , 1997 .

[37]  Feng Liu,et al.  Dynamically managed data for CPU-GPU architectures , 2012, CGO '12.

[38]  Jaewon Lee,et al.  GPUdmm: A high-performance and memory-oblivious GPU architecture using dynamic memory management , 2014, 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA).

[39]  Christopher Batten,et al.  Building Manycore Processor-to-DRAM Networks with Monolithic Silicon Photonics , 2008, 2008 16th IEEE Symposium on High Performance Interconnects.