论文信息 - TensorExpress: In-Network Communication Scheduling for Distributed Deep Learning

TensorExpress: In-Network Communication Scheduling for Distributed Deep Learning

TensorExpress provides in-network communication scheduling for distributed deep learning (DDL). In cloud-based DDL, parameter communication over a network is a key bottleneck. Previous studies proposed tensor packet reordering approaches to reduce network blocking time. However, network contention still exists in DDL. TensorExpress mitigates network contention and reduces overall training time. It schedules tensor packets in-network using P4, a switch programming language. TensorExpress improves latency and network blocking time up to 2.5 and 2.44 times, respectively.

Chuck Yoo | Gyeongsik Yang | Yeonho Yoo | Minkoo Kang

[1] Yibo Zhu,et al. A generic communication scheduler for distributed DNN training acceleration , 2019, SOSP.

[2] George Varghese,et al. P4: programming protocol-independent packet processors , 2013, CCRV.

[3] Sangeetha Abdu Jyothi,et al. TicTac: Accelerating Distributed Deep Learning with Communication Scheduling , 2018, MLSys.

[4] Bo Li,et al. Round-Robin Synchronization: Mitigating Communication Bottlenecks in Parameter Servers , 2019, IEEE INFOCOM 2019 - IEEE Conference on Computer Communications.