Understanding and optimizing asynchronous low-precision stochastic gradient descent
暂无分享,去创建一个
Kunle Olukotun | Christopher De Sa | Christopher Ré | Matthew Feldman | K. Olukotun | C. Ré | Matthew Feldman
[1] G. Amdhal,et al. Validity of the single processor approach to achieving large scale computing capabilities , 1967, AFIPS '67 (Spring).
[2] Strother H. Walker,et al. Estimation of the probability of an event as a function of several independent variables. , 1967, Biometrika.
[3] Geoffrey E. Hinton,et al. Learning representations by back-propagating errors , 1986, Nature.
[4] L. Bottou. Stochastic Gradient Learning in Neural Networks , 1991 .
[5] Takuji Nishimura,et al. Mersenne twister: a 623-dimensionally equidistributed uniform pseudo-random number generator , 1998, TOMC.
[6] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.
[7] Michael I. Jordan,et al. On Discriminative vs. Generative Classifiers: A comparison of logistic regression and naive Bayes , 2001, NIPS.
[8] Simon Haykin,et al. GradientBased Learning Applied to Document Recognition , 2001 .
[9] Pierre L'Ecuyer,et al. On the xorshift random number generators , 2005, TOMC.
[10] Benjamin Recht,et al. Random Features for Large-Scale Kernel Machines , 2007, NIPS.
[11] Samuel Williams,et al. Roofline: an insightful visual performance model for multicore architectures , 2009, CACM.
[12] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .
[13] Joseph M. Hellerstein,et al. GraphLab: A New Framework For Parallel Machine Learning , 2010, UAI.
[14] Léon Bottou,et al. Large-Scale Machine Learning with Stochastic Gradient Descent , 2010, COMPSTAT.
[15] Gaël Varoquaux,et al. Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..
[16] Stephen J. Wright,et al. Hogwild: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent , 2011, NIPS.
[17] Medhat A. Moussa,et al. Resource Efficient Arithmetic Effects on RBM Neural Network Solution Quality Using MNIST , 2011, 2011 International Conference on Reconfigurable Computing and FPGAs.
[18] L. Deng,et al. The MNIST Database of Handwritten Digit Images for Machine Learning Research [Best of the Web] , 2012, IEEE Signal Processing Magazine.
[19] Kun Li,et al. The MADlib Analytics Library or MAD Skills, the SQL , 2012, Proc. VLDB Endow..
[20] Richard W. Vuduc,et al. When Prefetching Works, When It Doesn’t, and Why , 2012, TACO.
[21] Léon Bottou,et al. Stochastic Gradient Descent Tricks , 2012, Neural Networks: Tricks of the Trade.
[22] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[23] Inderjit S. Dhillon,et al. Scalable Coordinate Descent Approaches to Parallel Matrix Factorization for Recommender Systems , 2012, 2012 IEEE 12th International Conference on Data Mining.
[24] Matthew J. Johnson,et al. Analyzing Hogwild Parallel Gaussian Gibbs Sampling , 2013, NIPS.
[25] Herb Sutter,et al. The Free Lunch Is Over A Fundamental Turn Toward Concurrency in Software , 2013 .
[26] Marc'Aurelio Ranzato,et al. Building high-level features using large scale unsupervised learning , 2011, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[27] Tim Kraska,et al. MLI: An API for Distributed Machine Learning , 2013, 2013 IEEE 13th International Conference on Data Mining.
[28] Christoforos E. Kozyrakis,et al. ZSim: fast and accurate microarchitectural simulation of thousand-core systems , 2013, ISCA.
[29] Rong Zheng,et al. Asynchronous stochastic gradient descent for DNN training , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[30] Christopher Ré,et al. DimmWitted: A Study of Main-Memory Statistical Analytics , 2014, Proc. VLDB Endow..
[31] Yoshua Bengio,et al. Training deep neural networks with low precision multiplications , 2014 .
[32] John Langford,et al. A reliable effective terascale linear learning system , 2011, J. Mach. Learn. Res..
[33] Dong Yu,et al. 1-bit stochastic gradient descent and its application to data-parallel distributed training of speech DNNs , 2014, INTERSPEECH.
[34] Simon Osindero,et al. Dogwild! – Distributed Hogwild for CPU & GPU , 2014 .
[35] Trevor Darrell,et al. Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.
[36] Rajiv Gupta,et al. ASPIRE: exploiting asynchronous parallelism in iterative algorithms using a relaxed consistency based DSM , 2014, OOPSLA.
[37] Trishul M. Chilimbi,et al. Project Adam: Building an Efficient and Scalable Deep Learning Training System , 2014, OSDI.
[38] Stephen J. Wright,et al. An asynchronous parallel stochastic coordinate descent algorithm , 2013, J. Mach. Learn. Res..
[39] Kunle Olukotun,et al. Generating Configurable Hardware from Parallel Patterns , 2015, ASPLOS.
[40] Keshav Pingali,et al. Stochastic gradient descent on GPUs , 2015, GPGPU@PPoPP.
[41] Nikko Strom,et al. Scalable distributed DNN training using commodity GPU cloud computing , 2015, INTERSPEECH.
[42] Stephen J. Wright,et al. Asynchronous Stochastic Coordinate Descent: Parallelism and Convergence Properties , 2014, SIAM J. Optim..
[43] Pritish Narayanan,et al. Deep Learning with Limited Numerical Precision , 2015, ICML.
[44] Kunle Olukotun,et al. Taming the Wild: A Unified Analysis of Hogwild-Style Algorithms , 2015, NIPS.
[45] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.
[46] Alexandros G. Dimakis,et al. FrogWild! - Fast PageRank Approximations on Graph Engines , 2015, Proc. VLDB Endow..
[47] Kunle Olukotun,et al. Automatic Generation of Efficient Accelerators for Reconfigurable Hardware , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).
[48] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[49] Christopher De Sa,et al. Ensuring Rapid Mixing and Low Bias for Asynchronous Gibbs Sampling , 2016, ICML.
[50] Ioannis Mitliagkas,et al. Omnivore: An Optimizer for Multi-device Deep Learning on CPUs and GPUs , 2016, ArXiv.
[51] Dong Han,et al. Cambricon: An Instruction Set Architecture for Neural Networks , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).
[52] Diego Klabjan,et al. Classification-Based Financial Markets Prediction Using Deep Neural Networks , 2016, Algorithmic Finance.
[53] Dimitris S. Papailiopoulos,et al. Perturbed Iterate Analysis for Asynchronous Stochastic Optimization , 2015, SIAM J. Optim..