Deep Learning at Scale
暂无分享,去创建一个
Marco Aldinucci | Maurizio Drocco | Iacopo Colonnelli | Paolo Viviani | Daniele Baccega | Marco Aldinucci | M. Drocco | Paolo Viviani | Daniele Baccega | Iacopo Colonnelli
[1] Razvan Pascanu,et al. Theano: Deep Learning on GPUs with Python , 2012 .
[2] Nenghai Yu,et al. Asynchronous Stochastic Gradient Descent with Delay Compensation , 2016, ICML.
[3] Jorge Nocedal,et al. On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima , 2016, ICLR.
[4] Guigang Zhang,et al. Deep Learning , 2016, Int. J. Semantic Comput..
[5] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[6] Guido Boella,et al. HPC4AI: an AI-on-demand federated platform endeavour , 2018, CF.
[7] Gerasimos Spanakis,et al. Accumulated Gradient Normalization , 2017, ACML.
[8] Francisco Facchinei,et al. Asynchronous Parallel Algorithms for Nonconvex Big-Data Optimization: Model and Convergence , 2016, ArXiv.
[9] Martin Fodslette Møller,et al. Supervised Learning On Large Redundant Training Sets , 1993, Int. J. Neural Syst..
[10] Abhinav Vishnu,et al. GossipGraD: Scalable Deep Learning using Gossip Communication based Asynchronous Gradient Descent , 2018, ArXiv.
[11] Geoffrey E. Hinton,et al. Distilling the Knowledge in a Neural Network , 2015, ArXiv.
[12] Takuya Akiba,et al. Extremely Large Minibatch SGD: Training ResNet-50 on ImageNet in 15 Minutes , 2017, ArXiv.
[13] Massimo Torquati,et al. Porting Decision Tree Algorithms to Multicore using FastFlow , 2010, ECML/PKDD.
[14] David A. Patterson,et al. In-datacenter performance analysis of a tensor processing unit , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).
[15] Léon Bottou,et al. The Tradeoffs of Large Scale Learning , 2007, NIPS.
[16] Torsten Hoefler,et al. Demystifying Parallel and Distributed Deep Learning , 2018, ACM Comput. Surv..
[17] Yann LeCun,et al. Large Scale Online Learning , 2003, NIPS.
[18] Thomas Paine,et al. GPU Asynchronous Stochastic Gradient Descent to Speed Up Neural Network Training , 2013, ICLR.
[19] Yoshua Bengio,et al. Three Factors Influencing Minima in SGD , 2017, ArXiv.
[20] Jürgen Schmidhuber,et al. Deep learning in neural networks: An overview , 2014, Neural Networks.
[21] Tony R. Martinez,et al. The general inefficiency of batch training for gradient descent learning , 2003, Neural Networks.
[22] Soummya Kar,et al. Gossip Algorithms for Distributed Signal Processing , 2010, Proceedings of the IEEE.
[23] Peter Kilpatrick,et al. Targeting Distributed Systems in FastFlow , 2012, Euro-Par Workshops.
[24] William J. Dally,et al. Deep Gradient Compression: Reducing the Communication Bandwidth for Distributed Training , 2017, ICLR.
[25] Genevieve B. Orr,et al. Removing Noise in On-Line Search using Adaptive Batch Sizes , 1996, NIPS.
[26] Joeri Hermans,et al. On Scalable Deep Learning and Parallelizing Gradient Descent , 2017 .
[27] Kunle Olukotun,et al. DAWNBench : An End-to-End Deep Learning Benchmark and Competition , 2017 .
[28] Samy Bengio,et al. Revisiting Distributed Synchronous SGD , 2016, ArXiv.
[29] Dong Yu,et al. 1-bit stochastic gradient descent and its application to data-parallel distributed training of speech DNNs , 2014, INTERSPEECH.
[30] H. Robbins. A Stochastic Approximation Method , 1951 .
[31] Klaus-Robert Müller,et al. Efficient BackProp , 2012, Neural Networks: Tricks of the Trade.
[32] Stephen J. Wright,et al. Hogwild: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent , 2011, NIPS.
[33] Sebastian Ruder,et al. An overview of gradient descent optimization algorithms , 2016, Vestnik komp'iuternykh i informatsionnykh tekhnologii.
[34] Behrouz Touri,et al. Non-Convex Distributed Optimization , 2015, IEEE Transactions on Automatic Control.
[35] John Tran,et al. cuDNN: Efficient Primitives for Deep Learning , 2014, ArXiv.
[36] Angelia Nedic,et al. Asynchronous gossip algorithms for stochastic optimization , 2009, Proceedings of the 48h IEEE Conference on Decision and Control (CDC) held jointly with 2009 28th Chinese Control Conference.
[37] Trishul M. Chilimbi,et al. Project Adam: Building an Efficient and Scalable Deep Learning Training System , 2014, OSDI.
[38] Rajat Raina,et al. Large-scale deep unsupervised learning using graphics processors , 2009, ICML '09.
[39] Wei Zhang,et al. Can Decentralized Algorithms Outperform Centralized Algorithms? A Case Study for Decentralized Parallel Stochastic Gradient Descent , 2017, NIPS.
[40] Albert Cohen,et al. Tensor Comprehensions: Framework-Agnostic High-Performance Machine Learning Abstractions , 2018, ArXiv.
[41] Marco Danelutto,et al. FastFlow: High-level and Efficient Streaming on Multi-core , 2017 .
[42] M. Moller,et al. Supervised learning on large redundant training sets , 1992, Neural Networks for Signal Processing II Proceedings of the 1992 IEEE Workshop.
[43] Ji Liu,et al. Staleness-Aware Async-SGD for Distributed Deep Learning , 2015, IJCAI.
[44] Xiaohui Zhang,et al. Parallel training of Deep Neural Networks with Natural Gradient and Parameter Averaging , 2014, ICLR.
[45] Marco Aldinucci,et al. A Cluster-as-Accelerator Approach for SPMD-Free Data Parallelism , 2016, 2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP).
[46] Yann Le Cun,et al. A Theoretical Framework for Back-Propagation , 1988 .
[47] Dong Yu,et al. Pipelined Back-Propagation for Context-Dependent Deep Neural Networks , 2012, INTERSPEECH.
[48] Nikko Strom,et al. Scalable distributed DNN training using commodity GPU cloud computing , 2015, INTERSPEECH.
[49] Quoc V. Le,et al. Don't Decay the Learning Rate, Increase the Batch Size , 2017, ICLR.
[50] Dong Yu,et al. Scalable stacking and learning for building deep architectures , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[51] Michael Cogswell,et al. Why M Heads are Better than One: Training a Diverse Ensemble of Deep Networks , 2015, ArXiv.
[52] Quoc V. Le,et al. Tiled convolutional neural networks , 2010, NIPS.
[53] Yann LeCun,et al. Deep learning with Elastic Averaging SGD , 2014, NIPS.
[54] Jorge Nocedal,et al. Optimization Methods for Large-Scale Machine Learning , 2016, SIAM Rev..
[55] Boris Polyak,et al. Acceleration of stochastic approximation by averaging , 1992 .
[56] Carlo Luschi,et al. Revisiting Small Batch Training for Deep Neural Networks , 2018, ArXiv.
[57] Ioannis Mitliagkas,et al. Asynchrony begets momentum, with an application to deep learning , 2016, 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton).
[58] Yann LeCun,et al. The Loss Surfaces of Multilayer Networks , 2014, AISTATS.
[59] Stefano Lusso,et al. OCCAM: a flexible, multi-purpose and extendable HPC cluster , 2017, ArXiv.
[60] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.
[61] Marc'Aurelio Ranzato,et al. Large Scale Distributed Deep Networks , 2012, NIPS.
[62] Paul J. Werbos,et al. Applications of advances in nonlinear sensitivity analysis , 1982 .
[63] Kaiming He,et al. Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour , 2017, ArXiv.
[64] Sameer Kumar,et al. PowerAI DDL , 2017, ArXiv.
[65] Wei Zhang,et al. Asynchronous Decentralized Parallel Stochastic Gradient Descent , 2017, ICML.
[66] Torsten Hoefler,et al. Corrected Gossip Algorithms for Fast Reliable Broadcast on Unreliable Systems , 2017, 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS).
[67] Yoshua Bengio,et al. Practical Recommendations for Gradient-Based Training of Deep Architectures , 2012, Neural Networks: Tricks of the Trade.
[68] Jeffrey S. Vetter,et al. NVIDIA Tensor Core Programmability, Performance & Precision , 2018, 2018 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).
[69] Ning Qian,et al. On the momentum term in gradient descent learning algorithms , 1999, Neural Networks.
[70] Janis Keuper,et al. Asynchronous parallel stochastic gradient descent: a numeric core for scalable distributed machine learning algorithms , 2015, MLHPC@SC.
[71] Geoffrey Zweig,et al. An introduction to computational networks and the computational network toolkit (invited talk) , 2014, INTERSPEECH.
[72] Drocco Maurizio. Parallel Programming with Global Asynchronous Memory: Models, C++ APIs and Implementations , 2017 .