Energy-efficient VFI-partitioned multicore design using wireless NoC architectures

In recent years, multiple Voltage Frequency Island (VFI)-based designs have increasingly made their way into both commercial and research multicore platforms. On the other hand, the wireless Network-on-Chip (WiNoC) architecture has emerged as an energy-efficient and high bandwidth communication backbone for massively integrated multicore platforms. It becomes therefore possible to exploit the small-world effects induced by the wireless links of a WiNoC to achieve efficient inter-VFI data exchanges. In this work, we demonstrate that WiNoCs can provide better latency and energy profiles compared to traditional mesh-like architecture for VFI-partitioned multicore designs. The performance gains and energy efficiency are achieved due to the low-power wireless shortcuts in conjunction with the small-world architecture. Indeed, our experimental results show energy improvements as large as 40% for multithreaded application benchmarks.

[1]  Jonathan Chang,et al.  A 45 nm 8-Core Enterprise Xeon¯ Processor , 2010, IEEE J. Solid State Circuits.

[2]  T. Petermann,et al.  Spatial small-world networks: A wiring-cost perspective , 2005, cond-mat/0501420.

[3]  Christian Bienia,et al.  Benchmarking modern multiprocessors , 2011 .

[4]  Partha Pratim Pande,et al.  Performance evaluation of wireless NoCs in presence of irregular network routing strategies , 2014, 2014 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[5]  Partha Pratim Pande,et al.  Complex network-enabled robust wireless network-on-chip architectures , 2013, JETC.

[6]  Gaurav Mittal,et al.  Design of the Power6 Microprocessor , 2007, 2007 IEEE International Solid-State Circuits Conference. Digest of Technical Papers.

[7]  Partha Pratim Pande,et al.  Performance evaluation and design trade-offs for network-on-chip interconnect architectures , 2005, IEEE Transactions on Computers.

[8]  Radu Marculescu,et al.  Communication architecture optimization: making the shortest path shorter in regular networks-on-chip , 2006, Proceedings of the Design Automation & Test in Europe Conference.

[9]  Somayeh Sardashti,et al.  The gem5 simulator , 2011, CARN.

[10]  Partha Pratim Pande,et al.  Wireless NoC as Interconnection Backbone for Multicore Chips: Promises and Challenges , 2012, IEEE Journal on Emerging and Selected Topics in Circuits and Systems.

[11]  M. K. Gowan,et al.  A 65 nm 2-Billion Transistor Quad-Core Itanium Processor , 2009, IEEE Journal of Solid-State Circuits.

[12]  H. Mair,et al.  A 65-nm Mobile Multimedia Applications Processor with an Adaptive Power Management Scheme to Compensate for Variations , 2007, 2007 IEEE Symposium on VLSI Circuits.

[13]  Olav Lysne,et al.  Topology Agnostic Dynamic Quick Reconfiguration for Large-Scale Interconnection Networks , 2012, 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012).

[14]  Partha Pratim Pande,et al.  Energy-efficient multicore chip design through cross-layer approach , 2013, 2013 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[15]  Partha Pratim Pande,et al.  Design of an Energy-Efficient CMOS-Compatible NoC Architecture with Millimeter-Wave Wireless Interconnects , 2013, IEEE Transactions on Computers.

[16]  Chih-Ming Hung,et al.  Intra-chip wireless interconnect for clock distribution implemented with integrated antennas, receivers, and transmitters , 2002, IEEE J. Solid State Circuits.

[17]  Stefan Rusu,et al.  A 45nm 8-core enterprise Xeon ® processor , 2009 .

[18]  Niraj K. Jha,et al.  Token flow control , 2008, 2008 41st IEEE/ACM International Symposium on Microarchitecture.

[19]  Olav Lysne,et al.  Layered routing in irregular networks , 2006, IEEE Transactions on Parallel and Distributed Systems.

[20]  M. K. Gowan,et al.  A 65nm 2-Billion-Transistor Quad-Core Itanium® Processor , 2008, 2008 IEEE International Solid-State Circuits Conference - Digest of Technical Papers.

[21]  Radu Marculescu,et al.  Custom Feedback control: Enabling truly scalable on-chip power management for MPSoCs , 2010, 2010 ACM/IEEE International Symposium on Low-Power Electronics and Design (ISLPED).

[22]  Steven M. Nowick,et al.  A low-latency FIFO for mixed-clock systems , 2000, Proceedings IEEE Computer Society Workshop on VLSI 2000. System Design for a System-on-Chip Era.

[23]  Siddharth Garg,et al.  Learning the optimal operating point for many-core systems with extended range voltage/frequency scaling , 2013, 2013 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS).

[24]  Anoop Gupta,et al.  The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.

[25]  Radu Marculescu,et al.  "It's a small world after all": NoC performance optimization via long-range link insertion , 2006, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[26]  Niraj K. Jha,et al.  Express virtual channels: towards the ideal interconnection fabric , 2007, ISCA '07.

[27]  Jean-Michel Chabloz,et al.  Globally-Ratiochronous, Locally-Synchronous Systems , 2012 .

[28]  Daniel Marcos Chapiro,et al.  Globally-asynchronous locally-synchronous systems , 1985 .