Empirical Investigation of Stale Value Tolerance on Parallel RNN Training
暂无分享,去创建一个
[1] Doug Terry,et al. Replicated data consistency explained through baseball , 2013, CACM.
[2] Beatrice Santorini,et al. Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.
[3] Tomas Mikolov,et al. RNNLM - Recurrent Neural Network Language Modeling Toolkit , 2011 .
[4] Kunle Olukotun,et al. Understanding and optimizing asynchronous low-precision stochastic gradient descent , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).
[5] Ohad Shamir,et al. The Power of Depth for Feedforward Neural Networks , 2015, COLT.
[6] Hyesoon Kim,et al. StaleLearn: Learning Acceleration with Asynchronous Synchronization Between Model Replicas on PIM , 2018, IEEE Transactions on Computers.
[7] Stephen J. Wright,et al. Hogwild: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent , 2011, NIPS.
[8] Andrea C. Arpaci-Dusseau,et al. Effective distributed scheduling of parallel workloads , 1996, SIGMETRICS '96.
[9] Alekh Jindal,et al. Hadoop++ , 2010 .
[10] Kevin T. Pedretti,et al. The impact of system design parameters on application noise sensitivity , 2010, 2010 IEEE International Conference on Cluster Computing.
[11] Alexander J. Smola,et al. Scalable inference in latent variable models , 2012, WSDM '12.
[12] John Langford,et al. Slow Learners are Fast , 2009, NIPS.
[13] Joseph M. Hellerstein,et al. GraphLab: A New Framework For Parallel Machine Learning , 2010, UAI.
[14] Marvin Theimer,et al. Session guarantees for weakly consistent replicated data , 1994, Proceedings of 3rd International Conference on Parallel and Distributed Information Systems.
[15] Trevor Darrell,et al. Long-term recurrent convolutional networks for visual recognition and description , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[16] Ioannis Mitliagkas,et al. YellowFin and the Art of Momentum Tuning , 2017, MLSys.
[17] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[18] Thorsten Brants,et al. One billion word benchmark for measuring progress in statistical language modeling , 2013, INTERSPEECH.
[19] Martín Abadi,et al. TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.
[20] Samy Bengio,et al. Show and tell: A neural image caption generator , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[21] Stephen Tyree,et al. Reinforcement Learning through Asynchronous Advantage Actor-Critic on a GPU , 2016, ICLR.
[22] Yoshua Bengio,et al. Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.
[23] Alexander Mordvintsev,et al. Inceptionism: Going Deeper into Neural Networks , 2015 .
[24] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[25] David L. Black,et al. An OSF/1 UNIX for Massively Parallel Multicomputers , 1993, USENIX Winter.
[26] Seunghak Lee,et al. More Effective Distributed ML via a Stale Synchronous Parallel Parameter Server , 2013, NIPS.
[27] Marc'Aurelio Ranzato,et al. Large Scale Distributed Deep Networks , 2012, NIPS.
[28] John C. Duchi,et al. Distributed delayed stochastic optimization , 2011, 2012 IEEE 51st IEEE Conference on Decision and Control (CDC).
[29] Yoshua Bengio,et al. Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.
[30] Kimberly Keeton,et al. LazyBase: trading freshness for performance in a scalable database , 2012, EuroSys '12.
[31] Daniel Povey,et al. The Kaldi Speech Recognition Toolkit , 2011 .
[32] Carlos Guestrin,et al. Distributed GraphLab : A Framework for Machine Learning and Data Mining in the Cloud , 2012 .
[33] Alex Graves,et al. Recurrent Models of Visual Attention , 2014, NIPS.
[34] T. Lawson,et al. Spark , 2011 .
[35] Geoffrey E. Hinton,et al. On the importance of initialization and momentum in deep learning , 2013, ICML.
[36] Rajiv Gupta,et al. ASPIRE: exploiting asynchronous parallelism in iterative algorithms using a relaxed consistency based DSM , 2014, OOPSLA.
[37] Ioannis Mitliagkas,et al. Asynchrony begets momentum, with an application to deep learning , 2016, 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton).
[38] Boris Polyak. Some methods of speeding up the convergence of iteration methods , 1964 .
[39] Peter J. Haas,et al. Large-scale matrix factorization with distributed stochastic gradient descent , 2011, KDD.
[40] Yoshua Bengio,et al. Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.
[41] Trevor Darrell,et al. Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.
[42] Cho-Jui Hsieh,et al. HogWild++: A New Mechanism for Decentralized Asynchronous Stochastic Gradient Descent , 2016, 2016 IEEE 16th International Conference on Data Mining (ICDM).
[43] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.