Asynchronous Decentralized Distributed Training of Acoustic Models
暂无分享,去创建一个
Brian Kingsbury | George Saon | Mingrui Liu | Ulrich Finkler | David S. Kung | Abdullah Kayi | Wei Zhang | Xiaodong Cui | David Kung | Brian Kingsbury | Mingrui Liu | G. Saon | Wei Zhang | Xiaodong Cui | Ulrich Finkler | Abdullah Kayi
[1] Saeed Ghadimi,et al. Stochastic First- and Zeroth-Order Methods for Nonconvex Stochastic Programming , 2013, SIAM J. Optim..
[2] George Saon,et al. Distributed Training of Deep Neural Network Acoustic Models for Automatic Speech Recognition: A comparison of current training strategies , 2020, IEEE Signal Processing Magazine.
[3] Sree Hari Krishnan Parthasarathi,et al. Lessons from Building Acoustic Models with a Million Hours of Speech , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[4] Tara N. Sainath,et al. Deep Neural Networks for Acoustic Modeling in Speech Recognition , 2012 .
[5] Marc'Aurelio Ranzato,et al. Large Scale Distributed Deep Networks , 2012, NIPS.
[6] Fan Zhou,et al. On the convergence properties of a K-step averaging stochastic gradient descent algorithm for nonconvex optimization , 2017, IJCAI.
[7] Brian Kingsbury,et al. Fast Training of Deep Neural Networks for Speech Recognition , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[8] Guigang Zhang,et al. Deep Learning , 2016, Int. J. Semantic Comput..
[9] Tara N. Sainath,et al. Parallel deep neural network training for LVCSR tasks using blue gene/Q , 2014, INTERSPEECH.
[10] Brian Kingsbury,et al. A Highly Efficient Distributed Deep Learning System For Automatic Speech Recognition , 2019, INTERSPEECH.
[11] James Demmel,et al. Large Batch Optimization for Deep Learning: Training BERT in 76 minutes , 2019, ICLR.
[12] Erich Elsen,et al. Deep Speech: Scaling up end-to-end speech recognition , 2014, ArXiv.
[13] Robert M. Gray,et al. Toeplitz and Circulant Matrices: A Review , 2005, Found. Trends Commun. Inf. Theory.
[14] David Miller,et al. The Fisher Corpus: a Resource for the Next Generations of Speech-to-Text , 2004, LREC.
[15] John J. Godfrey,et al. SWITCHBOARD: telephone speech corpus for research and development , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.
[16] Dong Yu,et al. 1-bit stochastic gradient descent and its application to data-parallel distributed training of speech DNNs , 2014, INTERSPEECH.
[17] Wei Zhang,et al. Can Decentralized Algorithms Outperform Centralized Algorithms? A Case Study for Decentralized Parallel Stochastic Gradient Descent , 2017, NIPS.
[18] Wei Zhang,et al. Asynchronous Decentralized Parallel Stochastic Gradient Descent , 2017, ICML.
[19] Suyog Gupta,et al. Model Accuracy and Runtime Tradeoff in Distributed Deep Learning: A Systematic Study , 2015, 2016 IEEE 16th International Conference on Data Mining (ICDM).
[20] Pranav Ladkat,et al. Realizing Petabyte Scale Acoustic Modeling , 2019, IEEE Journal on Emerging and Selected Topics in Circuits and Systems.
[21] Tao Wang,et al. Scale MLPerf-0.6 models on Google TPU-v3 Pods , 2019, ArXiv.
[22] Kai Chen,et al. Parallelizing Adam Optimizer with Blockwise Model-Update Filtering , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[23] Sanjeev Khudanpur,et al. Parallel training of DNNs with Natural Gradient and Parameter Averaging , 2014 .
[24] Georg Heigold,et al. Multilingual acoustic models using distributed deep neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[25] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[26] Chong Wang,et al. Deep Speech 2 : End-to-End Speech Recognition in English and Mandarin , 2015, ICML.
[27] Rong Zheng,et al. Asynchronous stochastic gradient descent for DNN training , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[28] Xiaodong Cui,et al. Distributed Deep Learning Strategies for Automatic Speech Recognition , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[29] Dong Yu,et al. On parallelizability of stochastic gradient descent for speech DNNS , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[30] Adnan Haider,et al. Sequence training of DNN acoustic models with natural gradient , 2017, 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[31] Brian Kingsbury,et al. Improving Efficiency in Large-Scale Decentralized Distributed Training , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[32] Qiang Huo,et al. Scalable training of deep learning machines by incremental block training with intra-block parallel optimization and blockwise model-update filtering , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[33] Georg Heigold,et al. Asynchronous stochastic optimization for sequence training of deep neural networks , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[34] Nikko Strom,et al. Scalable distributed DNN training using commodity GPU cloud computing , 2015, INTERSPEECH.
[35] Geoffrey Zweig,et al. Toward Human Parity in Conversational Speech Recognition , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[36] Georg Heigold,et al. Sequence discriminative distributed training of long short-term memory recurrent neural networks , 2014, INTERSPEECH.
[37] James Demmel,et al. ImageNet Training in Minutes , 2017, ICPP.
[38] Kaiming He,et al. Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour , 2017, ArXiv.
[39] Yuanzhou Yang,et al. Highly Scalable Deep Learning Training System with Mixed-Precision: Training ImageNet in Four Minutes , 2018, ArXiv.
[40] Chao Zhang,et al. A Distributed Optimisation Framework Combining Natural Gradient with Hessian-Free for Discriminative Sequence Training , 2021, Neural Networks.
[41] Xiaodong Cui,et al. English Conversational Telephone Speech Recognition by Humans and Machines , 2017, INTERSPEECH.
[42] Guangsen Wang,et al. A Random Gossip BMUF Process for Neural Language Modeling , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).