暂无分享,去创建一个
Samy Bengio | Jianmin Chen | Rajat Monga | Rafal Józefowicz | Samy Bengio | Jianmin Chen | R. Józefowicz | R. Monga
[1] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .
[2] Yoram Singer,et al. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..
[3] Alexander J. Smola,et al. Parallelized Stochastic Gradient Descent , 2010, NIPS.
[4] Stephen J. Wright,et al. Hogwild: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent , 2011, NIPS.
[5] Marc'Aurelio Ranzato,et al. Large Scale Distributed Deep Networks , 2012, NIPS.
[6] Tara N. Sainath,et al. Deep Neural Networks for Acoustic Modeling in Speech Recognition , 2012 .
[7] Luiz André Barroso,et al. The tail at scale , 2013, CACM.
[8] Michael I. Jordan,et al. Estimation, Optimization, and Parallelism when Data is Sparse , 2013, NIPS.
[9] Alexander J. Smola,et al. Scaling Distributed Machine Learning with the Parameter Server , 2014, OSDI.
[10] Thorsten Brants,et al. One billion word benchmark for measuring progress in statistical language modeling , 2013, INTERSPEECH.
[11] Andreas Krause,et al. Advances in Neural Information Processing Systems (NIPS) , 2014 .
[12] Trishul M. Chilimbi,et al. Project Adam: Building an Efficient and Scalable Deep Learning Training System , 2014, OSDI.
[13] Yaoliang Yu,et al. Petuum: A New Platform for Distributed Machine Learning on Big Data , 2013, IEEE Transactions on Big Data.
[14] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.
[15] Yuchen Zhang,et al. Splash: User-friendly Programming Interface for Parallelizing Stochastic Algorithms , 2015, ArXiv.
[16] Alex Graves,et al. DRAW: A Recurrent Neural Network For Image Generation , 2015, ICML.
[17] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[18] Kunle Olukotun,et al. Taming the Wild: A Unified Analysis of Hogwild-Style Algorithms , 2015, NIPS.
[19] Yann LeCun,et al. Deep learning with Elastic Averaging SGD , 2014, NIPS.
[20] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.
[21] Alexander J. Smola,et al. On Variance Reduction in Stochastic Gradient Descent and its Asynchronous Variants , 2015, NIPS.
[22] Alex Graves,et al. Conditional Image Generation with PixelCNN Decoders , 2016, NIPS.
[23] Sergey Ioffe,et al. Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[24] Yonghui Wu,et al. Exploring the Limits of Language Modeling , 2016, ArXiv.
[25] Raul Castro Fernandez,et al. Ako: Decentralised Deep Learning with Partial Gradient Exchange , 2016, SoCC.
[26] Ji Liu,et al. Staleness-Aware Async-SGD for Distributed Deep Learning , 2015, IJCAI.
[27] Xinyun Chen. Under Review as a Conference Paper at Iclr 2017 Delving into Transferable Adversarial Ex- Amples and Black-box Attacks , 2016 .
[28] Pradeep Dubey,et al. Distributed Deep Learning Using Synchronous Stochastic Gradient Descent , 2016, ArXiv.
[29] Jorge Nocedal,et al. On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima , 2016, ICLR.
[30] Fabian Pedregosa,et al. ASAGA: Asynchronous Parallel SAGA , 2016, AISTATS.
[31] Dimitris S. Papailiopoulos,et al. Perturbed Iterate Analysis for Asynchronous Stochastic Optimization , 2015, SIAM J. Optim..