Pipe-SGD: A Decentralized Pipelined SGD Framework for Distributed Deep Net Training
暂无分享,去创建一个
Nam Sung Kim | Alexander G. Schwing | Amir Salman Avestimehr | Mingchao Yu | Youjie Li | Songze Li | A. Schwing | A. Avestimehr | N. Kim | Youjie Li | Songze Li | Mingchao Yu
[1] Qian Wang,et al. Energy efficient parallel neuromorphic architectures with approximate arithmetic on FPGA , 2017, Neurocomputing.
[2] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[3] Asit K. Mishra,et al. From high-level deep neural models to FPGAs , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[4] Jason Cong,et al. Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks , 2015, FPGA.
[5] Wei Zhang,et al. Asynchronous Decentralized Parallel Stochastic Gradient Descent , 2017, ICML.
[6] Forrest N. Iandola,et al. FireCaffe: Near-Linear Acceleration of Deep Neural Network Training on Compute Clusters , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[7] Marc'Aurelio Ranzato,et al. Large Scale Distributed Deep Networks , 2012, NIPS.
[8] Marc Snir,et al. Aluminum: An Asynchronous, GPU-Aware Communication Library Optimized for Large-Scale Training of Deep Neural Networks on HPC Systems , 2018, 2018 IEEE/ACM Machine Learning in HPC Environments (MLHPC).
[9] Kaiming He,et al. Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour , 2017, ArXiv.
[10] Alex Graves,et al. Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.
[11] Michael I. Jordan,et al. SparkNet: Training Deep Networks in Spark , 2015, ICLR.
[12] Sungroh Yoon,et al. DeepSpark: A Spark-Based Distributed Deep Learning Framework for Commodity Clusters , 2016 .
[13] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[14] Rajeev Thakur,et al. Optimization of Collective Communication Operations in MPICH , 2005, Int. J. High Perform. Comput. Appl..
[15] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.
[16] John Langford,et al. Slow Learners are Fast , 2009, NIPS.
[17] Bolei Zhou,et al. Learning Deep Features for Scene Recognition using Places Database , 2014, NIPS.
[18] Mohammad Alian,et al. A Network-Centric Hardware/Algorithm Co-Design to Accelerate Distributed Training of Deep Neural Networks , 2018, 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[19] William J. Dally,et al. Deep Gradient Compression: Reducing the Communication Bandwidth for Distributed Training , 2017, ICLR.
[20] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.
[21] Gregory Shakhnarovich,et al. FractalNet: Ultra-Deep Neural Networks without Residuals , 2016, ICLR.
[22] Nikhil R. Devanur,et al. PipeDream: Fast and Efficient Pipeline Parallel DNN Training , 2018, ArXiv.
[23] Qian Wang,et al. Liquid state machine based pattern recognition on FPGA with firing-activity dependent power gating and approximate computing , 2016, 2016 IEEE International Symposium on Circuits and Systems (ISCAS).
[24] Scott Shenker,et al. Spark: Cluster Computing with Working Sets , 2010, HotCloud.
[25] Dong Yu,et al. 1-bit stochastic gradient descent and its application to data-parallel distributed training of speech DNNs , 2014, INTERSPEECH.
[26] Trishul M. Chilimbi,et al. Project Adam: Building an Efficient and Scalable Deep Learning Training System , 2014, OSDI.
[27] Iasonas Kokkinos,et al. Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs , 2014, ICLR.
[28] M. Abadi,et al. Naiad: a timely dataflow system , 2013, SOSP.
[29] Cong Xu,et al. TernGrad: Ternary Gradients to Reduce Communication in Distributed Deep Learning , 2017, NIPS.
[30] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.
[31] Eric P. Xing,et al. GeePS: scalable deep learning on distributed GPUs with a GPU-specialized parameter server , 2016, EuroSys.
[32] Nikko Strom,et al. Scalable distributed DNN training using commodity GPU cloud computing , 2015, INTERSPEECH.
[33] Pascal Vincent,et al. Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[34] Alexander J. Smola,et al. Communication Efficient Distributed Machine Learning with the Parameter Server , 2014, NIPS.
[35] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[36] Wei Zhang,et al. Can Decentralized Algorithms Outperform Centralized Algorithms? A Case Study for Decentralized Parallel Stochastic Gradient Descent , 2017, NIPS.
[37] Seunghak Lee,et al. More Effective Distributed ML via a Stale Synchronous Parallel Parameter Server , 2013, NIPS.
[38] Wei Zhang,et al. AdaComp : Adaptive Residual Gradient Compression for Data-Parallel Distributed Training , 2017, AAAI.
[39] Alexander J. Smola,et al. Scaling Distributed Machine Learning with the Parameter Server , 2014, OSDI.
[40] Guigang Zhang,et al. Deep Learning , 2016, Int. J. Semantic Comput..
[41] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .
[42] Yuan Yu,et al. TensorFlow: A system for large-scale machine learning , 2016, OSDI.
[43] Sam Ade Jacobs,et al. Communication Quantization for Data-Parallel Training of Deep Neural Networks , 2016, 2016 2nd Workshop on Machine Learning in HPC Environments (MLHPC).
[44] Dan Alistarh,et al. QSGD: Communication-Optimal Stochastic Gradient Descent, with Applications to Training Neural Networks , 2016, 1610.02132.
[45] Yuan Yu,et al. Dryad: distributed data-parallel programs from sequential building blocks , 2007, EuroSys '07.
[46] Sanjay Ghemawat,et al. MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.
[47] Renjie Liao,et al. Learning Deep Parsimonious Representations , 2016, NIPS.