Two Tiered Distributed Training Algorithm for Acoustic Modeling
暂无分享,去创建一个
Pranav Ladkat | Oleg Rybakov | Radhika Arava | Sree Hari Krishnan Parthasarathi | I-Fan Chen | Nikko Strom
[1] Sree Hari Krishnan Parthasarathi,et al. Lessons from Building Acoustic Models with a Million Hours of Speech , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[2] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[3] Lei Xie,et al. Empirical Evaluation of Parallel Training Algorithms on Acoustic Modeling , 2017, INTERSPEECH.
[4] Qiang Huo,et al. Scalable training of deep learning machines by incremental block training with intra-block parallel optimization and blockwise model-update filtering , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[5] Dan Alistarh,et al. QSGD: Randomized Quantization for Communication-Optimal Stochastic Gradient Descent , 2016, ArXiv.
[6] Sree Hari Krishnan Parthasarathi,et al. fMLLR based feature-space speaker adaptation of DNN acoustic models , 2015, INTERSPEECH.
[7] Xin Yuan,et al. Bandwidth optimal all-reduce algorithms for clusters of workstations , 2009, J. Parallel Distributed Comput..
[8] Pranav Ladkat,et al. Realizing Petabyte Scale Acoustic Modeling , 2019, IEEE Journal on Emerging and Selected Topics in Circuits and Systems.
[9] Zheng Xu,et al. Training Neural Networks Without Gradients: A Scalable ADMM Approach , 2016, ICML.
[10] Xiaohui Zhang,et al. Parallel training of Deep Neural Networks with Natural Gradient and Parameter Averaging , 2014, ICLR.
[11] Trishul M. Chilimbi,et al. Project Adam: Building an Efficient and Scalable Deep Learning Training System , 2014, OSDI.
[12] Stephen P. Boyd,et al. Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..
[13] Sree Hari Krishnan Parthasarathi,et al. Robust Speech Recognition via Anchor Word Representations , 2017, INTERSPEECH.
[14] Marc'Aurelio Ranzato,et al. Large Scale Distributed Deep Networks , 2012, NIPS.
[15] Kaiming He,et al. Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour , 2017, ArXiv.
[16] William J. Dally,et al. Deep Gradient Compression: Reducing the Communication Bandwidth for Distributed Training , 2017, ICLR.
[17] Yuanzhou Yang,et al. Highly Scalable Deep Learning Training System with Mixed-Precision: Training ImageNet in Four Minutes , 2018, ArXiv.
[18] Alexander J. Smola,et al. Parallelized Stochastic Gradient Descent , 2010, NIPS.
[19] Tara N. Sainath,et al. Parallel Deep Neural Network Training for Big Data on Blue Gene/Q , 2017, IEEE Transactions on Parallel and Distributed Systems.
[20] Nikko Strom,et al. Scalable distributed DNN training using commodity GPU cloud computing , 2015, INTERSPEECH.
[21] Stephen J. Wright,et al. Hogwild: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent , 2011, NIPS.
[22] Geoffrey Zweig,et al. The microsoft 2016 conversational speech recognition system , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[23] Sree Hari Krishnan Parthasarathi,et al. Robust i-vector based adaptation of DNN acoustic model for speech recognition , 2015, INTERSPEECH.
[24] Dong Yu,et al. 1-bit stochastic gradient descent and its application to data-parallel distributed training of speech DNNs , 2014, INTERSPEECH.
[25] Message Passing Interface Forum. MPI: A message - passing interface standard , 1994 .
[26] Torsten Hoefler,et al. Demystifying Parallel and Distributed Deep Learning: An In-Depth Concurrency Analysis. , 2018 .
[27] Yang You,et al. Scaling SGD Batch Size to 32K for ImageNet Training , 2017, ArXiv.
[28] Tara N. Sainath,et al. Deep Neural Networks for Acoustic Modeling in Speech Recognition , 2012 .
[29] James Martens,et al. Deep learning via Hessian-free optimization , 2010, ICML.
[30] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.