Disaggregated optical-layer switching for optically composable disaggregated computing [Invited]

Disaggregated computing has been widely investigated to support the continuous progress in computing performance and overcome the slowdown of Moore’s law. It involves a flexible and optimal interconnection of heterogeneous compute nodes, such as the CPU, GPU, xPU, memory, and storage, to offer an efficient computing environment for various applications. Such a scheme inherently requires high network performance, including low latency, high capacity, determinism, and energy efficiency, all of which are simultaneously achieved through the introduction of optical-layer switching. This paper presents the application of optical-layer-switching architectures to disaggregated computing. Networks associated with disaggregated computing are classified into intra- and interserver networks. Focusing on the intraserver network, a holistic concept of optically composable disaggregated computing (OCDC) is discussed, along with its technological direction toward future digital infrastructure (i.e., the computing continuum). To realize OCDC, scalable and flexible optical switch technologies, as well as their dynamic and automatic control and management mechanisms, are indispensable. Previous studies have reported 32×32 silicon photonic switches that can form a nine-stage Clos topology with a radix of 131,072 and a machine-processable function description model for optical-layer switching, called the functional block-based disaggregation model (FBD model) that is capable of automating the operation, administration, and management of any optical physical topology in cooperation with upper-layer operating systems. This study examines their applicability to OCDC. The superior energy performance and scaling of an OCDC system equipped with optical matrix switches, such as silicon photonic switches, with respect to the conventional one big electrical packet-switching approach based on the reported wall-plug power consumption is also presented. The potential applicability of the FBD model as an essential control and management system for the optical layer of OCDC is evaluated through numerical experiments.

[1]  S. Namiki,et al.  Precise path computation based on functional block-based disaggregation for future heterogeneous access-metro networks , 2022, Optical fiber technology (Print).

[2]  800G and Beyond in Intra and Inter Datacenters , 2022, 2022 27th OptoElectronics and Communications Conference (OECC) and 2022 International Conference on Photonics in Switching and Computing (PSC).

[3]  Min Yee Teh,et al.  Performance trade-offs in reconfigurable networks for HPC , 2022, Journal of Optical Communications and Networking.

[4]  Ziyi Zhu,et al.  Accelerating Distributed Machine Learning in Disaggregated Architectures with Flexible Optically Interconnected Computing Resources , 2022, Optical Fiber Communications Conference and Exhibition.

[5]  S. Namiki,et al.  Recent Advances in Large-scale Optical Switches Based on Silicon Photonics , 2022, 2022 Optical Fiber Communications Conference and Exhibition (OFC).

[6]  S. Namiki,et al.  “Digitalizing” Optical Layer for The Green Computing Continuum As The Future Digital Infrastructure , 2022, Optical Fiber Communications Conference and Exhibition.

[7]  S. Namiki,et al.  Scalability of integer linear programming path computation for functional block-based disaggregation supporting a flexible grid mechanism [Invited] , 2021, Journal of Optical Communications and Networking.

[8]  S. Namiki,et al.  Integration and Control of Heterogeneous Telecom and Data Center Optical Networks Aided by FBD and TAPI for Enhancing Large-scale Optical Path Services and Network Resiliency , 2021, European Conference on Optical Communication.

[9]  Madeleine Glick,et al.  SiP-ML: high-bandwidth optical network interconnects for machine learning training , 2021, SIGCOMM.

[10]  S. Namiki,et al.  Fully-Loaded Operation of 0.29-pJ/bit Wall-plug Efficiency, 81.9-Tb/s Throughput 32 × 32 Silicon Photonics Switch , 2021, 2021 Optical Fiber Communications Conference and Exhibition (OFC).

[11]  Shu Namiki,et al.  Mathematical Model of Optical Functional Blocks for Automating Fully Disaggregated Optical Networks , 2021, 2021 Optical Fiber Communications Conference and Exhibition (OFC).

[12]  Yoshinari Awaji,et al.  Blade Abstraction Interface for Diverse Blade Integration and Unified Control of Disaggregate/Legacy ROADMs , 2021, 2021 Optical Fiber Communications Conference and Exhibition (OFC).

[13]  Joshua L. Benjamin,et al.  MONet: heterogeneous Memory over Optical Network for large-scale data center resource disaggregation , 2021, IEEE/OSA Journal of Optical Communications and Networking.

[14]  Florian Schmidt,et al.  Towards a Cognitive Compute Continuum: An Architecture for Ad-Hoc Self-Managed Swarms , 2021, 2021 IEEE/ACM 21st International Symposium on Cluster, Cloud and Internet Computing (CCGrid).

[15]  S. Namiki,et al.  Automatic Mapping Between Real Hardware Composition and ROADM Model for Agile Node Updates , 2021, Journal of Lightwave Technology.

[16]  J. Stewart,et al.  Co-packaged Optics for Data Center Switching , 2020, 2020 European Conference on Optical Communications (ECOC).

[17]  F. J. Vilchez,et al.  Two-Level Abstraction Approach for SDN-based Service Provisioning in Open Line Systems Featuring TAPI Externalized Path Computation , 2020, European Conference on Optical Communication.

[18]  John Shalf,et al.  PINE: Photonic Integrated Networked Energy efficient datacenters (ENLITENED Program) [Invited] , 2020, IEEE/OSA Journal of Optical Communications and Networking.

[19]  S. Yoo,et al.  Architecture and Performance Studies of 3D-Hyper-FleX-LION for Reconfigurable All-to-All HPC Networks , 2020, SC20: International Conference for High Performance Computing, Networking, Storage and Analysis.

[20]  Gabriel Antoniu,et al.  E2Clab: Exploring the Computing Continuum through Repeatable, Replicable and Reproducible Edge-to-Cloud Experiments , 2020, 2020 IEEE International Conference on Cluster Computing (CLUSTER).

[21]  Hitesh Ballani,et al.  Sirius: A Flat Datacenter Network with Nanosecond Optical Switching , 2020, SIGCOMM.

[22]  Jun Terada,et al.  Future optical access network enabled by modularization and softwarization of access and transmission functions [Invited] , 2020, IEEE/OSA Journal of Optical Communications and Networking.

[23]  Luca P. Carloni,et al.  Photonic Switched Optically Connected Memory: An Approach to Address Memory Challenges in Deep Learning , 2020, Journal of Lightwave Technology.

[24]  Micah Beck,et al.  Harnessing the Computing Continuum for Programming Our World , 2020, Fog Computing.

[25]  Shu Namiki,et al.  Optical Network Resource Management Supporting Physical Layer Reconfiguration , 2019, Journal of Lightwave Technology.

[26]  Laxmi N. Bhuyan,et al.  P4NFV: P4 Enabled NFV Systems with SmartNICs , 2019, 2019 IEEE Conference on Network Function Virtualization and Software Defined Networks (NFV-SDN).

[27]  Ming C. Wu,et al.  Wafer-scale silicon photonic switches beyond die size limit , 2019, Optica.

[28]  Yojiro Mori,et al.  Design and Evaluation of Optical Circuit Switches for Intra-Datacenter Networking , 2019, Journal of Lightwave Technology.

[29]  Shu Namiki,et al.  Topology Description Generation and Path Computation Framework for Dynamic Optical Path Network with Heterogeneous Switches , 2018, 2018 Optical Fiber Communications Conference and Exposition (OFC).

[30]  Georgios Zervas,et al.  Optically disaggregated data centers with minimal remote memory latency: Technologies, architectures, and resource allocation [Invited] , 2018, IEEE/OSA Journal of Optical Communications and Networking.

[31]  I-Hsin Chung,et al.  Towards a Composable Computer System , 2018, HPC Asia.

[32]  Alex C. Snoeren,et al.  RotorNet: A Scalable, Low-complexity, Optical Datacenter Network , 2017, SIGCOMM.

[33]  Shu Namiki,et al.  Challenges and Impact of Dynamic Optical-Layer Switching - Ten years of VICTORIES and Beyond , 2017 .

[34]  G. Zervas,et al.  Disaggregated compute, memory and network systems: A new era for optical data centre architectures , 2017, 2017 Optical Fiber Communications Conference and Exhibition (OFC).

[35]  Dai Suzuki,et al.  Demonstration of fast cooperative operations in disaggregated optical node systems , 2017, 2017 Optical Fiber Communications Conference and Exhibition (OFC).

[36]  Biswanath Mukherjee,et al.  Spatial division multiplexing for high capacity optical interconnects in modular data centers , 2017, IEEE/OSA Journal of Optical Communications and Networking.

[37]  Keren Bergman,et al.  Flexfly: Enabling a Reconfigurable Dragonfly through Silicon Photonics , 2016, SC16: International Conference for High Performance Computing, Networking, Storage and Analysis.

[38]  D. Brunina,et al.  Building Data Centers With Optically Connected Memory , 2011, IEEE/OSA Journal of Optical Communications and Networking.

[39]  Amin Vahdat,et al.  Helios: a hybrid electrical/optical switch architecture for modular data centers , 2010, SIGCOMM '10.

[40]  Rami G. Melhem,et al.  On the Feasibility of Optical Circuit Switching for High Performance Computing Systems , 2005, ACM/IEEE SC 2005 Conference (SC'05).

[41]  S. Namiki,et al.  Low-Insertion-Loss and Power-Efficient 32 × 32 Silicon Photonics Switch With Extremely High-Δ Silica PLC Connector , 2019, Journal of Lightwave Technology.

[42]  G. Zervas,et al.  Optically Disaggregated Data Centres with Minimal Remote Memory Latency: Technologies, Architectures, and Resource Allocation , 2017 .