Peta-Scale Embedded Photonics Architecture for Distributed Deep Learning Applications
暂无分享,去创建一个
Liang Yuan Dai | J. Shalf | S. Rumley | Ziyi Zhu | Asher Novick | George Michelogiannakis | Zhenguo Wu | Keren Bergman | Madeleine Glick
[1] Asher Novick,et al. Dispersion-Engineered and Fabrication-Robust SOI Waveguides for Ultra-Broadband DWDM , 2023, Optical Fiber Communications Conference and Exhibition.
[2] Asher Novick,et al. Low-Loss Wide-FSR Miniaturized Racetrack Style Microring Filters for ⩾1 Tbps DWDM , 2023, 2023 Optical Fiber Communications Conference and Exhibition (OFC).
[3] Liang Yuan Dai,et al. SiP Architecture For Accelerating Collective Communication in Distributed Deep Learning , 2023, 2023 Optical Fiber Communications Conference and Exhibition (OFC).
[4] Dan Li,et al. Impact of Synchronization Topology on DML Performance: Both Logical Topology and Physical Topology , 2022, IEEE/ACM Transactions on Networking.
[5] Liang Yuan Dai,et al. Streamlined Architecture for Thermal Control and Stabilization of Cascaded DWDM Micro-Ring Filters Bus , 2022, Optical Fiber Communications Conference and Exhibition.
[6] Zhihao Jia,et al. TopoOpt: Co-optimizing Network Topology and Parallelization Strategy for Distributed Training Jobs , 2022, NSDI.
[7] Reza Yazdani Aminabadi,et al. Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model , 2022, ArXiv.
[8] Tushar Krishna,et al. Exploring Multi-dimensional Hierarchical Network Topologies for Efficient Distributed Training of Trillion Parameter DL Models , 2021, ArXiv.
[9] Bok Young Kim,et al. Integrated Kerr frequency comb-driven silicon photonic transmitter , 2021, 2109.10297.
[10] Wei Jiang,et al. Co-designing the Topology/Algorithm to Accelerate Distributed Training , 2021, 2021 IEEE Intl Conf on Parallel & Distributed Processing with Applications, Big Data & Cloud Computing, Sustainable Computing & Communications, Social Computing & Networking (ISPA/BDCloud/SocialCom/SustainCom).
[11] Madeleine Glick,et al. SiP-ML: high-bandwidth optical network interconnects for machine learning training , 2021, SIGCOMM.
[12] Truong Thao Nguyen,et al. Hybrid Electrical/Optical Switch Architectures for Training Distributed Deep Learning in Large-Scale , 2021, IEICE Trans. Inf. Syst..
[13] Huaxi Gu,et al. X-NEST: A Scalable, Flexible, and High-Performance Network Architecture for Distributed Machine Learning , 2021, Journal of Lightwave Technology.
[14] T. Hoefler,et al. Flare: Flexible In-Network Allreduce , 2021, SC21: International Conference for High Performance Computing, Networking, Storage and Analysis.
[15] Jiayi Huang,et al. Communication Algorithm-Architecture Co-Design for Distributed Deep Learning , 2021, 2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA).
[16] Nathan C. Abrams,et al. 3D-Integrated Multichip Module Transceiver for Terabit-Scale DWDM Interconnects , 2021, 2021 Optical Fiber Communications Conference and Exhibition (OFC).
[17] Vladimir Stojanovic,et al. 8 Tbps Co-Packaged FPGA and Silicon Photonics Optical IO , 2021, 2021 Optical Fiber Communications Conference and Exhibition (OFC).
[18] Bok Young Kim,et al. Error-Free Kerr Comb-Driven SiP Microdisk Transmitter , 2021, 2021 Conference on Lasers and Electro-Optics (CLEO).
[19] Olatunji Ruwase,et al. ZeRO-Infinity: Breaking the GPU Memory Wall for Extreme Scale Deep learning , 2021, SC21: International Conference for High Performance Computing, Networking, Storage and Analysis.
[20] Kiran Kumar Matam,et al. Software-hardware co-design for fast and scalable training of deep learning recommendation models , 2021, ISCA.
[21] Amar Phanishayee,et al. Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LM , 2021, SC21: International Conference for High Performance Computing, Networking, Storage and Analysis.
[22] S. Yoo,et al. Silicon Photonic Flex-LIONS for Reconfigurable Multi-GPU Systems , 2021, Journal of Lightwave Technology.
[23] John E. Bowers,et al. A Scalable Multicast Hybrid Broadband Crossbar Wavelength Selective Switch For Datacenters , 2021, 2021 IEEE 11th Annual Computing and Communication Workshop and Conference (CCWC).
[24] Noam M. Shazeer,et al. Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity , 2021, J. Mach. Learn. Res..
[25] Nicholson T. Collier,et al. High-bypass Learning: Automated Detection of Tumor Cells That Significantly Impact Drug Response , 2020, 2020 IEEE/ACM Workshop on Machine Learning in High Performance Computing Environments (MLHPC) and Workshop on Artificial Intelligence and Machine Learning for Scientific Applications (AI4S).
[26] Marco Canini,et al. Efficient sparse collective communication and its application to accelerate distributed deep learning , 2020, SIGCOMM.
[27] Tushar Krishna,et al. ASTRA-SIM: Enabling SW/HW Co-Design Exploration for Distributed DL Training Platforms , 2020, 2020 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).
[28] Qixiang Cheng,et al. Silicon Photonic 2.5D Multi-Chip Module Transceiver for High-Performance Data Centers , 2020, Journal of Lightwave Technology.
[29] Srinivas Sridharan,et al. Enabling Compute-Communication Overlap in Distributed Deep Learning Training Platforms , 2020, 2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA).
[30] Patricia Layec,et al. Performance Model and Design Rules for Optical Systems Employing Low-Resolution DAC/ADC , 2020, Journal of Lightwave Technology.
[31] Mark Chen,et al. Language Models are Few-Shot Learners , 2020, NeurIPS.
[32] A. Boes,et al. Ultra-dense optical data transmission over standard fibre with a single chip source , 2020, Nature Communications.
[33] Gang Sun,et al. PSNet: Reconfigurable network topology design for accelerating parameter server architecture based distributed machine learning , 2020, Future Gener. Comput. Syst..
[34] Bor-Yiing Su,et al. Deep Learning Training in Facebook Data Centers: Design of Scale-up and Scale-out Systems , 2020, ArXiv.
[35] Mark Wade,et al. TeraPHY: A Chiplet Technology for Low-Power, High-Bandwidth In-Package Optical I/O , 2020, IEEE Micro.
[36] Qixiang Cheng,et al. Experimental Demonstration of PAM-4 Transmission through Microring Silicon Photonic Clos Switch Fabric , 2020, 2020 Optical Fiber Communications Conference and Exhibition (OFC).
[37] Ryousei Takano,et al. On the Feasibility of Hybrid Electrical/Optical Switch Architecture for Large-Scale Training of Distributed Deep Learning , 2019, 2019 IEEE/ACM Workshop on Photonics-Optics Technology Oriented Networking, Information and Computing Systems (PHOTONICS).
[38] M. Shoeybi,et al. Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism , 2019, ArXiv.
[39] Michal Lipson,et al. Turn-Key, High-Efficiency Kerr Comb Source , 2019, 2020 Conference on Lasers and Electro-Optics (CLEO).
[40] Qixiang Cheng,et al. Scalable Microring-Based Silicon Clos Switch Fabric With Switch-and-Select Stages , 2019, IEEE Journal of Selected Topics in Quantum Electronics.
[41] Xu Liu,et al. Evaluating Modern GPU Interconnect: PCIe, NVLink, NV-SLI, NVSwitch and GPUDirect , 2019, IEEE Transactions on Parallel and Distributed Systems.
[42] Yiming Yang,et al. Transformer-XL: Attentive Language Models beyond a Fixed-Length Context , 2019, ACL.
[43] Marc Snir,et al. Aluminum: An Asynchronous, GPU-Aware Communication Library Optimized for Large-Scale Training of Deep Neural Networks on HPC Systems , 2018, 2018 IEEE/ACM Machine Learning in HPC Environments (MLHPC).
[44] Qixiang Cheng,et al. Recent advances in optical technologies for data centers: a review , 2018, Optica.
[45] Yuanzhou Yang,et al. Highly Scalable Deep Learning Training System with Mixed-Precision: Training ImageNet in Four Minutes , 2018, ArXiv.
[46] Qixiang Cheng,et al. Design Space Exploration of Microring Resonators in Silicon Photonic Interconnects: Impact of the Ring Curvature , 2018, Journal of Lightwave Technology.
[47] Xi Chen,et al. Ground-referenced signaling for intra-chip and short-reach chip-to-chip interconnects , 2018, 2018 IEEE Custom Integrated Circuits Conference (CICC).
[48] Tomislav Drenski,et al. ADC & DAC — Technology Trends and Steps to Overcome Current Limitations , 2018, 2018 Optical Fiber Communications Conference and Exposition (OFC).
[49] Alexander Sergeev,et al. Horovod: fast and easy distributed deep learning in TensorFlow , 2018, ArXiv.
[50] William J. Dally,et al. Deep Gradient Compression: Reducing the Communication Bandwidth for Distributed Training , 2017, ICLR.
[51] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[52] Kenneth Heafield,et al. Sparse Communication for Distributed Gradient Descent , 2017, EMNLP.
[53] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[54] Yaoliang Yu,et al. Distributed Machine Learning via Sufficient Factor Broadcasting , 2015, ArXiv.
[55] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[56] Haitao Wu,et al. BCube: a high performance, server-centric network architecture for modular data centers , 2009, SIGCOMM '09.
[57] William J. Dally,et al. Technology-Driven, Highly-Scalable Dragonfly Topology , 2008, 2008 International Symposium on Computer Architecture.
[58] Rajeev Thakur,et al. Optimization of Collective Communication Operations in MPICH , 2005, Int. J. High Perform. Comput. Appl..
[59] Dharma P. Agrawal,et al. Generalized Hypercube and Hyperbus Structures for a Computer Network , 1984, IEEE Transactions on Computers.
[60] Bok Young Kim,et al. Petabit-Scale Silicon Photonic Interconnects With Integrated Kerr Frequency Combs , 2022, IEEE Journal of Selected Topics in Quantum Electronics.
[61] Jie. Training Deep Learning Recommendation Model with Quantized Collective Communications , 2020 .
[62] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[63] H. Rong,et al. A 128 Gb/s PAM4 Silicon Microring Modulator With Integrated Thermo-Optic Resonance Tuning , 2019, Journal of Lightwave Technology.
[64] Ilya Sutskever,et al. Language Models are Unsupervised Multitask Learners , 2019 .
[65] Kenji Tanaka,et al. Large-Message Size Allreduce at Wire Speed for Distributed Deep Learning , 2018 .
[66] Jianping Wu,et al. BML: A High-performance, Low-cost Gradient Synchronization Algorithm for DML Training , 2018, NeurIPS.