High-Bandwidth Low-Latency Approximate Interconnection Networks

Computational applications are subject to various kinds of numerical errors, ranging from deterministic round-off errors to soft errors caused by non-deterministic bit flips, which do not lead to application failure but corrupt application results. Non-deterministic bit flips are typically mitigated in hardware using various error correcting codes (ECC). But in practice, due to performance and cost concerns, these techniques do not guarantee error-free execution. On large-scale computing platforms, soft errors occur with non-negligible probability in RAM and on the CPU, and it has become clear that applications must tolerate them. For some applications, this tolerance is intrinsic as result quality can remain acceptable even in the presence of soft errors (e.g., data analysis applications, multimedia applications). Tolerance can also be built into the application, resolving data corruptions in software during application execution. By contrast, today's optical networks hold on to a rigid error-free standard, which imposes limits on network performance scalability. In this work we propose high-bandwidth, low-latency approximate networks with the following three features:(1) Optical links that exploit multi-level quadrature amplitude modulation (QAM) for achieving high bandwidth, (2) Avoidance of forward error correction (FEC), which makes optical link error-prone but affords lower latency, and(3) The use of symbol mapping coding between bit sequence and QAM to ensure data integrity that is sufficient for practical soft-error-tolerant applications. Discrete-event simulation results for application benchmarks show that approx networks achieve speedups up to 2.94 when compared to conventional networks.

[1]  D. Simeonidou,et al.  DORIOS: Demonstration of an all-optical distributed CPU, memory, storage intra DCN interconnect , 2015, 2015 Optical Fiber Communications Conference and Exhibition (OFC).

[2]  Amirhossein Ghazisaeidi,et al.  Optimized spectrally efficient transceiver for 400-Gb/s single carrier transport , 2014, 2014 The European Conference on Optical Communication (ECOC).

[3]  William J. Dally,et al.  The BlackWidow High-Radix Clos Network , 2006, 33rd International Symposium on Computer Architecture (ISCA'06).

[4]  P. P. Vaidyanathan,et al.  Signal Processing and Optimization for Transceiver Systems , 2010 .

[5]  Antonio Robles,et al.  A Survey and Evaluation of Topology-Agnostic Deterministic Routing Algorithms , 2012, IEEE Transactions on Parallel and Distributed Systems.

[6]  Shan Zhong,et al.  Scalable and topology adaptive intra-data center networking enabled by wavelength selective switching , 2014, OFC 2014.

[7]  Ming-Fang Huang,et al.  50.53-Gb/s PDM-1024QAM-OFDM transmission using pilot-based phase noise mitigation , 2011, 16th Opto-Electronics and Communications Conference.

[8]  Nikhil Garge,et al.  ParaKMeans: Implementation of a parallelized K-means algorithm suitable for general laboratory use , 2008, BMC Bioinformatics.

[9]  William J. Dally,et al.  Technology-Driven, Highly-Scalable Dragonfly Topology , 2008, 2008 International Symposium on Computer Architecture.

[10]  J. P. Grossman,et al.  Unifying on-chip and inter-node switching within the Anton 2 network , 2014, 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA).

[11]  Sébastien Bigo,et al.  Transmission of 50-GHz-spaced single-carrier channels at 516Gb/s over 600km , 2013, 2013 Optical Fiber Communication Conference and Exposition and the National Fiber Optic Engineers Conference (OFC/NFOEC).

[12]  Jianjun Yu,et al.  256-Gb/s single-carrier PM-256QAM implementation using coordinated DD-LMS and CMA equalization , 2015, 2015 European Conference on Optical Communication (ECOC).

[13]  Jacob A. Abraham,et al.  Algorithm-Based Fault Tolerance for Matrix Operations , 1984, IEEE Transactions on Computers.

[14]  Michihiro Koibuchi,et al.  Suitability of the Random Topology for HPC Applications , 2016, 2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP).

[15]  Kurt B. Ferreira,et al.  Fault-tolerant iterative methods via selective reliability. , 2011 .

[16]  Dan Grossman,et al.  EnerJ: approximate data types for safe and general low-power computation , 2011, PLDI '11.

[17]  Shreesha Srinath,et al.  Design and Implementation of an “Approximate” Communication System for Wireless Media Applications , 2010, IEEE/ACM Transactions on Networking.

[18]  David Fiala Detection and correction of silent data corruption for large-scale high-performance computing , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.

[19]  John Kim,et al.  High Performance Datacenter Networks: Architectures, Algorithms, and Opportunities , 2011, High Performance Datacenter Networks: Architectures, Algorithms, and Opportunities.

[20]  Luis Ceze,et al.  SAP: an Architecture for Selectively Approximate Wireless Communication , 2015, ArXiv.

[21]  Roberto Proietti,et al.  Scalable and distributed optical interconnect architecture based on AWGR for HPC and data centers , 2014, OFC 2014.

[22]  George Bosilca,et al.  Algorithm-based fault tolerance applied to high performance computing , 2009, J. Parallel Distributed Comput..

[23]  Torsten Hoefler,et al.  Slim Fly: A Cost Effective Low-Diameter Network Topology , 2014, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis.

[24]  Franck Cappello,et al.  Toward Exascale Resilience: 2014 update , 2014, Supercomput. Front. Innov..

[25]  Zizhong Chen,et al.  Online-ABFT: an online algorithm based fault tolerance scheme for soft error detection in iterative methods , 2013, PPoPP '13.

[26]  Bronis R. de Supinski,et al.  Soft error vulnerability of iterative linear algebra methods , 2007, ICS '08.

[27]  Alan Wood,et al.  The impact of new technology on soft error rates , 2011, 2011 International Reliability Physics Symposium.

[28]  K. Christodoulopoulos,et al.  Accelerating HPC workloads with dynamic adaptation of a software-defined hybrid electronic/optical interconnect , 2014, OFC 2014.

[29]  Feng Gao,et al.  Fault tolerant matrix-matrix multiplication: correcting soft errors on-line , 2011, ScalA '11.

[30]  Padma Raghavan,et al.  Fault tolerant preconditioned conjugate gradient for sparse linear system solution , 2012, ICS '12.

[31]  Austin R. Benson,et al.  Silent error detection in numerical time-stepping schemes , 2015, Int. J. High Perform. Comput. Appl..

[32]  H. J. S. Dorren,et al.  1.3µm SDN-enabled optical packet switch architecture for high performance and programmable data center network , 2015, 2015 Optical Fiber Communications Conference and Exhibition (OFC).

[33]  William Shieh,et al.  End-to-End Energy Modeling and Analysis of Long-Haul Coherent Transmission Systems , 2014, Journal of Lightwave Technology.

[34]  Tatsushi Nakahara,et al.  A torus datacenter network based on OPS/OCS/VOCS enabled by smart flow management , 2015, 2015 Optical Fiber Communications Conference and Exhibition (OFC).

[35]  Peter J. Winzer,et al.  Single-carrier 400G interface and 10-channel WDM transmission over 4,800 km using all-ETDM 107-Gbaud PDM-QPSK , 2013, 2013 Optical Fiber Communication Conference and Exposition and the National Fiber Optic Engineers Conference (OFC/NFOEC).

[36]  Jacob Nelson,et al.  Approximate storage in solid-state memories , 2013, 2013 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[37]  Henri Casanova,et al.  Versatile, scalable, and accurate simulation of distributed applications and platforms , 2014, J. Parallel Distributed Comput..

[38]  Henri Casanova,et al.  A case for random shortcut topologies for HPC interconnects , 2012, 2012 39th Annual International Symposium on Computer Architecture (ISCA).

[39]  Richard W. Vuduc,et al.  Self-stabilizing iterative solvers , 2013, ScalA '13.

[40]  Dong Li,et al.  Rethinking algorithm-based fault tolerance with a cooperative software-hardware approach , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[41]  Luis Ceze,et al.  Neural Acceleration for General-Purpose Approximate Programs , 2012, 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture.

[42]  Sparsh Mittal,et al.  A Survey of Techniques for Approximate Computing , 2016, ACM Comput. Surv..

[43]  Philip Heidelberger,et al.  Performance benefits of optical circuit switches for large-scale dragonfly networks , 2016, 2016 Optical Fiber Communications Conference and Exhibition (OFC).