Accelerating Distributed Reinforcement learning with In-Switch Computing
暂无分享,去创建一个
Alexander G. Schwing | Deming Chen | Jian Huang | Yifan Yuan | Iou-Jen Liu | Youjie Li | A. Schwing | Deming Chen | Youjie Li | Jian Huang | Yifan Yuan | Iou-Jen Liu
[1] Alexander J. Smola,et al. Communication Efficient Distributed Machine Learning with the Parameter Server , 2014, NIPS.
[2] Alex Graves,et al. Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.
[3] Jinjun Xiong,et al. DNNBuilder: an Automated Tool for Building High-Performance DNN Hardware Accelerators for FPGAs , 2018, 2018 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).
[4] Michael I. Jordan,et al. Real-Time Machine Learning: The Missing Pieces , 2017, HotOS.
[5] WangQian,et al. Energy efficient parallel neuromorphic architectures with approximate arithmetic on FPGA , 2017 .
[6] Rajeev Thakur,et al. Optimization of Collective Communication Operations in MPICH , 2005, Int. J. High Perform. Comput. Appl..
[7] Mohammad Alian,et al. A Network-Centric Hardware/Algorithm Co-Design to Accelerate Distributed Training of Deep Neural Networks , 2018, 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[8] Luca Antiga,et al. Automatic differentiation in PyTorch , 2017 .
[9] Seunghak Lee,et al. More Effective Distributed ML via a Stale Synchronous Parallel Parameter Server , 2013, NIPS.
[10] W. Daniel Hillis,et al. The Network Architecture of the Connection Machine CM-5 , 1996, J. Parallel Distributed Comput..
[11] Leslie Pack Kaelbling,et al. Efficient Distributed Reinforcement Learning through Agreement , 2008, DARS.
[12] Nick McKeown,et al. PISCES: A Programmable, Protocol-Independent Software Switch , 2016, SIGCOMM.
[13] Yuval Tassa,et al. MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[14] Hadi Esmaeilzadeh,et al. TABLA: A unified template-based framework for accelerating statistical machine learning , 2016, 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[15] Stephen J. Wright,et al. Hogwild: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent , 2011, NIPS.
[16] Shane Legg,et al. Massively Parallel Methods for Deep Reinforcement Learning , 2015, ArXiv.
[17] John Langford,et al. Slow Learners are Fast , 2009, NIPS.
[18] Alex C. Snoeren,et al. Inside the Social Network's (Datacenter) Network , 2015, Comput. Commun. Rev..
[19] Nam Sung Kim,et al. Pipe-SGD: A Decentralized Pipelined SGD Framework for Distributed Deep Net Training , 2018, NeurIPS.
[20] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.
[21] Philip Heidelberger,et al. The IBM Blue Gene/Q interconnection network and message unit , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).
[22] Qian Wang,et al. Liquid state machine based pattern recognition on FPGA with firing-activity dependent power gating and approximate computing , 2016, 2016 IEEE International Symposium on Circuits and Systems (ISCAS).
[23] Deming Chen,et al. High-performance video content recognition with long-term recurrent convolutional network for FPGA , 2017, 2017 27th International Conference on Field Programmable Logic and Applications (FPL).
[24] Hong Liu,et al. Jupiter Rising: A Decade of Clos Topologies and Centralized Control in Google's Datacenter Network , 2015, Comput. Commun. Rev..
[25] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.
[26] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[27] Michael I. Jordan,et al. Ray: A Distributed Framework for Emerging AI Applications , 2017, OSDI.
[28] Martín Abadi,et al. Dynamic control flow in large-scale machine learning , 2018, EuroSys.
[29] Xiaohui Ye,et al. Horizon: Facebook's Open Source Applied Reinforcement Learning Platform , 2018, ArXiv.
[30] Nam Sung Kim,et al. GradiVeQ: Vector Quantization for Bandwidth-Efficient Gradient Aggregation in Distributed CNN Training , 2018, NeurIPS.
[31] George Varghese,et al. P4: programming protocol-independent packet processors , 2013, CCRV.
[32] Qian Wang,et al. Energy efficient parallel neuromorphic architectures with approximate arithmetic on FPGA , 2017, Neurocomputing.
[33] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[34] Wei Zhang,et al. Asynchronous Decentralized Parallel Stochastic Gradient Descent , 2017, ICML.
[35] Hadi Esmaeilzadeh,et al. Scale-Out Acceleration for Machine Learning , 2017, 2017 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[36] Marc'Aurelio Ranzato,et al. Large Scale Distributed Deep Networks , 2012, NIPS.
[37] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.
[38] Kaiming He,et al. Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour , 2017, ArXiv.
[39] David A. Maltz,et al. Network traffic characteristics of data centers in the wild , 2010, IMC '10.
[40] Torsten Hoefler,et al. The PERCS High-Performance Interconnect , 2010, 2010 18th IEEE Symposium on High Performance Interconnects.
[41] Wang,et al. In-Datacenter Performance Analysis of a Tensor Processing UnitTM , .