Improving Knowledge Distillation of CTC-Trained Acoustic Models With Alignment-Consistent Ensemble and Target Delay
暂无分享,去创建一个
Haisong Ding | Qiang Huo | Kai Chen | Qiang Huo | Haisong Ding | Kai Chen
[1] Rich Caruana,et al. Model compression , 2006, KDD '06.
[2] Jinyu Li,et al. Progressive Joint Modeling in Unsupervised Single-Channel Overlapped Speech Recognition , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[3] Yoshua Bengio,et al. FitNets: Hints for Thin Deep Nets , 2014, ICLR.
[4] Brian Kingsbury,et al. Building Competitive Direct Acoustics-to-Word Models for English Conversational Speech Recognition , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[5] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[6] Yu Hu,et al. Nonrecurrent Neural Structure for Long-Term Dependence , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[7] Razvan Pascanu,et al. On the difficulty of training recurrent neural networks , 2012, ICML.
[8] S. Chiba,et al. Dynamic programming algorithm optimization for spoken word recognition , 1978 .
[9] Andrew W. Senior,et al. Long Short-Term Memory Based Recurrent Neural Network Architectures for Large Vocabulary Speech Recognition , 2014, ArXiv.
[10] Yevgen Chebotar,et al. Distilling Knowledge from Ensembles of Neural Networks for Speech Recognition , 2016, INTERSPEECH.
[11] Yifan Gong,et al. Learning small-size DNN with output-distribution-based criteria , 2014, INTERSPEECH.
[12] Hagen Soltau,et al. Neural Speech Recognizer: Acoustic-to-Word LSTM Model for Large Vocabulary Speech Recognition , 2016, INTERSPEECH.
[13] Bhuvana Ramabhadran,et al. Direct Acoustics-to-Word Models for English Conversational Speech Recognition , 2017, INTERSPEECH.
[14] Mark J. F. Gales,et al. Sequence Student-Teacher Training of Deep Neural Networks , 2016, INTERSPEECH.
[15] Geoffrey Zweig,et al. Toward Human Parity in Conversational Speech Recognition , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[16] Hisashi Kawai,et al. An Investigation of a Knowledge Distillation Method for CTC Acoustic Models , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[17] John J. Godfrey,et al. SWITCHBOARD: telephone speech corpus for research and development , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.
[18] Geoffrey E. Hinton,et al. Distilling the Knowledge in a Neural Network , 2015, ArXiv.
[19] Yashesh Gaur,et al. Reducing Bias in Production Speech Models , 2017, ArXiv.
[20] David Miller,et al. The Fisher Corpus: a Resource for the Next Generations of Speech-to-Text , 2004, LREC.
[21] Kai Yu,et al. Phone Synchronous Speech Recognition With CTC Lattices , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[22] Geoffrey Zweig,et al. Advances in all-neural speech recognition , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[23] Johan Schalkwyk,et al. Learning acoustic frame labeling for speech recognition with recurrent neural networks , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[24] Kai Chen,et al. Training Deep Bidirectional LSTM Acoustic Model for LVCSR by a Context-Sensitive-Chunk BPTT Approach , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[25] Alex Graves,et al. Sequence Transduction with Recurrent Neural Networks , 2012, ArXiv.
[26] Chengzhu Yu,et al. An Exploration of Directly Using Word as ACOUSTIC Modeling Unit for Speech Recognition , 2018, 2018 IEEE Spoken Language Technology Workshop (SLT).
[27] Geoffrey E. Hinton,et al. Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[28] Andrew W. Senior,et al. Fast and accurate recurrent neural network acoustic models for speech recognition , 2015, INTERSPEECH.
[29] Kartik Audhkhasi,et al. Guiding CTC Posterior Spike Timings for Improved Posterior Fusion and Knowledge Distillation , 2019, INTERSPEECH.
[30] Thomas G. Dietterich. Multiple Classifier Systems , 2000, Lecture Notes in Computer Science.
[31] Jürgen Schmidhuber,et al. Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks , 2006, ICML.
[32] Jürgen Schmidhuber,et al. Bidirectional LSTM Networks for Improved Phoneme Classification and Recognition , 2005, ICANN.
[33] Kartik Audhkhasi,et al. Improved Knowledge Distillation from Bi-Directional to Uni-Directional LSTM CTC for End-to-End Speech Recognition , 2018, 2018 IEEE Spoken Language Technology Workshop (SLT).
[34] Hisashi Kawai,et al. Investigation of Sequence-level Knowledge Distillation Methods for CTC Acoustic Models , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[35] Alexander M. Rush,et al. Sequence-Level Knowledge Distillation , 2016, EMNLP.
[36] Michelle Guo,et al. Knowledge distillation for small-footprint highway networks , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[37] Erich Elsen,et al. Deep Speech: Scaling up end-to-end speech recognition , 2014, ArXiv.
[38] Jinyu Li,et al. Improved training for online end-to-end speech recognition systems , 2017, INTERSPEECH.
[39] Steve Renals,et al. Small-Footprint Highway Deep Neural Networks for Speech Recognition , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[40] Daniel Jurafsky,et al. First-Pass Large Vocabulary Continuous Speech Recognition using Bi-Directional Recurrent DNNs , 2014, ArXiv.
[41] Daniel Povey,et al. The Kaldi Speech Recognition Toolkit , 2011 .
[42] Kai Yu,et al. Knowledge Distillation for Sequence Model , 2018, INTERSPEECH.
[43] Yifan Gong,et al. Speaker Adaptation for End-to-End CTC Models , 2018, 2018 IEEE Spoken Language Technology Workshop (SLT).
[44] Naoyuki Kanda,et al. Maximum-a-Posteriori-Based Decoding for End-to-End Acoustic Models , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[45] Haisong Ding,et al. Compression of CTC-Trained Acoustic Models by Dynamic Frame-Wise Distillation or Segment-Wise N-Best Hypotheses Imitation , 2019, INTERSPEECH.
[46] Tara N. Sainath,et al. Lower Frame Rate Neural Network Acoustic Models , 2016, INTERSPEECH.
[47] Yajie Miao,et al. EESEN: End-to-end speech recognition using deep RNN models and WFST-based decoding , 2015, 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU).
[48] Navdeep Jaitly,et al. Towards End-To-End Speech Recognition with Recurrent Neural Networks , 2014, ICML.
[49] Gerald Penn,et al. Convolutional Neural Networks for Speech Recognition , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[50] Tara N. Sainath,et al. Acoustic modelling with CD-CTC-SMBR LSTM RNNS , 2015, 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU).
[51] Xiaohui Zhang,et al. A diversity-penalizing ensemble training method for deep learning , 2015, INTERSPEECH.
[52] Trevor Darrell,et al. Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.
[53] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.
[54] Bhuvana Ramabhadran,et al. Efficient Knowledge Distillation from an Ensemble of Teachers , 2017, INTERSPEECH.
[55] Yu Zhang,et al. Highway long short-term memory RNNS for distant speech recognition , 2015, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[56] Chong Wang,et al. Deep Speech 2 : End-to-End Speech Recognition in English and Mandarin , 2015, ICML.
[57] Bhuvana Ramabhadran,et al. Training variance and performance evaluation of neural networks in speech , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[58] Jürgen Schmidhuber,et al. Recurrent nets that time and count , 2000, Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks. IJCNN 2000. Neural Computing: New Challenges and Perspectives for the New Millennium.