P-sync: A Photonically Enabled Architecture for Efficient Non-local Data Access

Communication in multi- and many-core processors has long been a bottleneck to performance due to the high cost of long-distance electrical transmission. This difficulty has been partially remedied by architectural constructs such as caches and novel interconnect topologies, albeit at a steep cost in terms of complexity. Unfortunately, even these measures are rendered ineffective by certain kinds of communication, most notably scatter and gather operations that exhibit highly nonlocal data access patterns. Much work has gone into examining how the increased bandwidth density afforded by chip-scale silicon photonic interconnect technologies affects computing, but photonics have additional properties that can be leveraged to greatly accelerate performance and energy efficiency under such difficult loads. This paper describes a novel synchronized global photonic bus and system architecture called P-sync that uses photonics' distance independence to greatly improve performance on many important applications previously limited by electronic interconnect. The architecture is evaluated in the context of a non-local yet common application: the distributed Fast Fourier Transform. We show that it is possible to achieve high efficiency by tightly balancing computation and communication latency in P-sync and achieve upwards of a 6× performance increase on gather patterns, even when bandwidth is equalized.

[1]  Alyssa B. Apsel,et al.  Leveraging Optical Technology in Future Bus-based Chip Multiprocessors , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[2]  S. Borkar,et al.  An 80-Tile Sub-100-W TeraFLOPS Processor in 65-nm CMOS , 2008, IEEE Journal of Solid-State Circuits.

[3]  Ken Mai,et al.  The future of wires , 2001, Proc. IEEE.

[4]  Henry Hoffmann,et al.  On-Chip Interconnection Architecture of the Tile Processor , 2007, IEEE Micro.

[5]  Leonid Oliker,et al.  Silicon Nanophotonic Network-on-Chip Using TDM Arbitration , 2010, 2010 18th IEEE Symposium on High Performance Interconnects.

[6]  Jung Ho Ahn,et al.  Corona: System Implications of Emerging Nanophotonic Technology , 2008, 2008 International Symposium on Computer Architecture.

[7]  Luca P. Carloni,et al.  Photonic NoC for DMA Communications in Chip Multiprocessors , 2007 .

[8]  John D. Owens,et al.  GPU Computing , 2008, Proceedings of the IEEE.

[9]  Ashok V. Krishnamoorthy,et al.  Computer Systems Based on Silicon Photonic Interconnects A proposed supercomputer-on-a-chip with optical interconnections between processing elements will require development of new lower-energy optical components and new circuit architectures that match electrical datapaths to complementary optical , 2009 .

[10]  Sharad Malik,et al.  Orion: a power-performance simulator for interconnection networks , 2002, 35th Annual IEEE/ACM International Symposium on Microarchitecture, 2002. (MICRO-35). Proceedings..

[11]  Michal Lipson,et al.  Scalable 3D dense integration of photonics on bulk silicon. , 2011, Optics express.

[12]  M. Lipson Guiding, modulating, and emitting light on Silicon-challenges and opportunities , 2005, Journal of Lightwave Technology.

[13]  Howard Wang,et al.  An All-Optical PCI-Express Network Interface for Optical Packet Switched Networks , 2007, OFC/NFOEC 2007 - 2007 Conference on Optical Fiber Communication and the National Fiber Optic Engineers Conference.

[14]  B. Jalali,et al.  Silicon Photonics , 2006, Journal of Lightwave Technology.

[15]  David H. Bailey,et al.  FFTs in external or hierarchical memory , 1989, Proceedings of the 1989 ACM/IEEE Conference on Supercomputing (Supercomputing '89).

[16]  Luca P. Carloni,et al.  Circuit-Switched Memory Access in Photonic Interconnection Networks for High-Performance Embedded Computing , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.

[17]  W. Dally,et al.  Route packets, not wires: on-chip interconnection networks , 2001, Proceedings of the 38th Design Automation Conference (IEEE Cat. No.01CH37232).

[18]  Henry Hoffmann,et al.  The Raw Microprocessor: A Computational Fabric for Software Circuits and General-Purpose Programs , 2002, IEEE Micro.

[19]  C. Gunn,et al.  CMOS photonics/spl trade/ - SOI learns a new trick , 2005, 2005 IEEE International SOI Conference Proceedings.

[20]  Hiroyuki Sato,et al.  An efficient technique for corner-turn in SAR image reconstruction by improving cache access , 2002, Proceedings 16th International Parallel and Distributed Processing Symposium.

[21]  Luca P. Carloni,et al.  PhoenixSim: A simulator for physical-layer analysis of chip-scale photonic interconnection networks , 2010, 2010 Design, Automation & Test in Europe Conference & Exhibition (DATE 2010).

[22]  Christopher Batten,et al.  Silicon-photonic clos networks for global on-chip communication , 2009, 2009 3rd ACM/IEEE International Symposium on Networks-on-Chip.