Acoustic Modeling Based on Deep Learning for Low-Resource Speech Recognition: An Overview
暂无分享,去创建一个
Xia Zhao | Chongchong Yu | Meng Kang | Yunbing Chen | Jiajia Wu | Jiajia Wu | Chongchong Yu | Meng Kang | Yunbing Chen | Xia Zhao
[1] Tara N. Sainath,et al. Minimum Word Error Rate Training for Attention-Based Sequence-to-Sequence Models , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[2] Raymond Ptucha,et al. Synthetic Data Augmentation for Improving Low-Resource ASR , 2019, 2019 IEEE Western New York Image and Signal Processing Workshop (WNYISPW).
[3] Maxim Korenevsky,et al. Exploring End-to-End Techniques for Low-Resource Speech Recognition , 2018, SPECOM.
[4] Xiaodong Cui,et al. Data Augmentation for Deep Neural Network Acoustic Modeling , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[5] Li Deng,et al. An Overview of Deep-Structured Learning for Information Processing , 2011 .
[6] William Hartmann,et al. Learning from the Best: A Teacher-student Multilingual Framework for Low-resource Languages , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[7] Shaohe Lv,et al. An Overview of End-to-End Automatic Speech Recognition , 2019, Symmetry.
[8] Yoshua Bengio,et al. Convolutional networks for images, speech, and time series , 1998 .
[9] Oriol Vinyals,et al. Matching Networks for One Shot Learning , 2016, NIPS.
[10] Hung-yi Lee,et al. Meta Learning for End-To-End Low-Resource Speech Recognition , 2019, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[11] Shuang Xu,et al. Multilingual Recurrent Neural Networks with Residual Learning for Low-Resource Speech Recognition , 2017, INTERSPEECH.
[12] Joaquin Vanschoren,et al. Meta-Learning: A Survey , 2018, Automated Machine Learning.
[13] Richard S. Zemel,et al. Prototypical Networks for Few-shot Learning , 2017, NIPS.
[14] Jianhua Tao,et al. Language-invariant Bottleneck Features from Adversarial End-to-end Acoustic Models for Low Resource Speech Recognition , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[15] Yoshua Bengio,et al. Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.
[16] Hynek Hermansky,et al. Robust speech recognition in unknown reverberant and noisy conditions , 2015, 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU).
[17] Tara N. Sainath,et al. Deep convolutional neural networks for LVCSR , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[18] Sanjeev Khudanpur,et al. Audio augmentation for speech recognition , 2015, INTERSPEECH.
[19] Lukás Burget,et al. BUT OpenSAT 2017 Speech Recognition System , 2018, INTERSPEECH.
[20] Peter Bell,et al. Structured output layer with auxiliary targets for context-dependent acoustic modelling , 2015, INTERSPEECH.
[21] Hirokazu Kameoka,et al. Parallel-Data-Free Voice Conversion Using Cycle-Consistent Adversarial Networks , 2017, ArXiv.
[22] Bhuvana Ramabhadran,et al. End-to-end speech recognition and keyword search on low-resource languages , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[23] Hideyuki Tachibana,et al. Efficiently Trainable Text-to-Speech System Based on Deep Convolutional Networks with Guided Attention , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[24] Jason Duncan,et al. Overview of the DARPA LORELEI Program , 2017, Machine Translation.
[25] Hulya Yalcin,et al. Improving Low Resource Turkish Speech Recognition with Data Augmentation and TTS , 2019, 2019 16th International Multi-Conference on Systems, Signals & Devices (SSD).
[26] Dirk Van Compernolle,et al. A study of rank-constrained multilingual DNNS for low-resource ASR , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[27] Chongchong Yu,et al. Cross-Language End-to-End Speech Recognition Research Based on Transfer Learning for the Low-Resource Tujia Language , 2019, Symmetry.
[28] Jürgen Schmidhuber,et al. Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks , 2006, ICML.
[29] Samy Bengio,et al. Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks , 2015, NIPS.
[30] Jia Liu,et al. Gated convolutional networks based hybrid acoustic models for low resource speech recognition , 2017, 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[31] Chng Eng Siong,et al. A comparative study of BNF and DNN multilingual training on cross-lingual low-resource speech recognition , 2015, INTERSPEECH.
[32] Jianhua Tao,et al. Adversarial Multilingual Training for Low-Resource Speech Recognition , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[33] Sergey Ioffe,et al. Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[34] Yoshua. Bengio,et al. Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..
[35] Gerald Penn,et al. Applying Convolutional Neural Networks concepts to hybrid NN-HMM model for speech recognition , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[36] Kou Tanaka,et al. StarGAN-VC: non-parallel many-to-many Voice Conversion Using Star Generative Adversarial Networks , 2018, 2018 IEEE Spoken Language Technology Workshop (SLT).
[37] Léon Bottou,et al. Wasserstein GAN , 2017, ArXiv.
[38] Sanjeev Khudanpur,et al. A time delay neural network architecture for efficient modeling of long temporal contexts , 2015, INTERSPEECH.
[39] Wu Chou,et al. Robust decision tree state tying for continuous speech recognition , 2000, IEEE Trans. Speech Audio Process..
[40] Cheung-Chi Leung,et al. Joint acoustic modeling of triphones and trigraphemes by multi-task learning deep neural networks for low-resource speech recognition , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[41] A G Ramakrishnan,et al. Data-pooling and multi-task learning for enhanced performance of speech recognition systems in multiple low resourced languages , 2019, 2019 National Conference on Communications (NCC).
[42] Yoshua Bengio,et al. End-to-end Continuous Speech Recognition using Attention-based Recurrent NN: First Results , 2014, ArXiv.
[43] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[44] Yoshua Bengio,et al. Generative Adversarial Networks , 2014, ArXiv.
[45] Shuang Xu,et al. Multidimensional Residual Learning Based on Recurrent Neural Networks for Acoustic Modeling , 2016, INTERSPEECH.
[46] Shinji Watanabe,et al. Joint CTC-attention based end-to-end speech recognition using multi-task learning , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[47] Kai Yu,et al. Speaker Augmentation for Low Resource Speech Recognition , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[48] Dong Yu,et al. Recent progresses in deep learning based acoustic models , 2017, IEEE/CAA Journal of Automatica Sinica.
[49] Erich Elsen,et al. Deep Speech: Scaling up end-to-end speech recognition , 2014, ArXiv.
[50] Boi Faltings,et al. Meta-Learning for Low-resource Natural Language Generation in Task-oriented Dialogue Systems , 2019, IJCAI.
[51] Hermann Ney,et al. Data augmentation, feature combination, and multilingual neural networks to improve ASR and KWS performance for low-resource languages , 2014, INTERSPEECH.
[52] Hairong Liu,et al. Exploring neural transducers for end-to-end speech recognition , 2017, 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[53] Shuang Xu,et al. Multilingual End-to-End Speech Recognition with A Single Transformer on Low-Resource Languages , 2018, ArXiv.
[54] Hui Wang,et al. Multilingual Convolutional, Long Short-Term Memory, Deep Neural Networks for Low Resource Speech Recognition , 2017 .
[55] Jeff Z. Ma,et al. Optimizing Multilingual Knowledge Transfer for Time-Delay Neural Networks with Low-Rank Factorization , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[56] Shuang Xu,et al. Speech-Transformer: A No-Recurrence Sequence-to-Sequence Model for Speech Recognition , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[57] Shuang Xu,et al. Syllable-Based Sequence-to-Sequence Speech Recognition with the Transformer in Mandarin Chinese , 2018, INTERSPEECH.
[58] Yong Wang,et al. Meta-Learning for Low-Resource Neural Machine Translation , 2018, EMNLP.
[59] Chong Wang,et al. Deep Speech 2 : End-to-End Speech Recognition in English and Mandarin , 2015, ICML.
[60] Yee Whye Teh,et al. A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.
[61] Junqing Yu,et al. Investigation of Various Hybrid Acoustic Modeling Units via a Multitask Learning and Deep Neural Network Technique for LVCSR of the Low-Resource Language, Amharic , 2019, IEEE Access.
[62] Chengyi Wang,et al. Semantic Mask for Transformer based End-to-End Speech Recognition , 2020, INTERSPEECH.
[63] Alex Graves,et al. Sequence Transduction with Recurrent Neural Networks , 2012, ArXiv.
[64] Tanja Schultz,et al. Automatic speech recognition for under-resourced languages: A survey , 2014, Speech Commun..
[65] Richard M. Schwartz,et al. Two-Stage Data Augmentation for Low-Resourced Speech Recognition , 2016, INTERSPEECH.
[66] Jia Liu,et al. Advanced recurrent network-based hybrid acoustic models for low resource speech recognition , 2018, EURASIP J. Audio Speech Music. Process..
[67] Mark J. F. Gales,et al. Speech recognition and keyword spotting for low-resource languages: Babel project research at CUED , 2014, SLTU.
[68] Florian Metze,et al. Domain Robust Feature Extraction for Rapid Low Resource ASR Development , 2018, 2018 IEEE Spoken Language Technology Workshop (SLT).
[69] Luke S. Zettlemoyer,et al. Transformers with convolutional context for ASR , 2019, ArXiv.
[70] Geoffrey E. Hinton,et al. Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[71] Hari Krishna Vydana,et al. An Exploration towards Joint Acoustic Modeling for Indian Languages: IIIT-H Submission for Low Resource Speech Recognition Challenge for Indian Languages, INTERSPEECH 2018 , 2018, INTERSPEECH.
[72] Peter Bell,et al. Learning to adapt: a meta-learning approach for speaker adaptation , 2018, Interspeech 2018.
[73] Geoffrey E. Hinton,et al. Deep Belief Networks for phone recognition , 2009 .
[74] Srinivasan Umesh,et al. Addressing data sparsity in DNN acoustic modeling , 2017, 2017 Twenty-third National Conference on Communications (NCC).
[75] Rohit Prabhavalkar,et al. Exploring architectures, data and units for streaming end-to-end speech recognition with RNN-transducer , 2017, 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[76] Andrew W. Senior,et al. Fast and accurate recurrent neural network acoustic models for speech recognition , 2015, INTERSPEECH.
[77] Dimitri Palaz,et al. Towards End-to-End Speech Recognition , 2016 .
[78] Jianhua Tao,et al. Language-Adversarial Transfer Learning for Low-Resource Speech Recognition , 2019, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[79] Meng Cai,et al. Convolutional maxout neural networks for low-resource speech recognition , 2014, The 9th International Symposium on Chinese Spoken Language Processing.
[80] Bin Ma,et al. Pairwise learning using multi-lingual bottleneck features for low-resource query-by-example spoken term detection , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[81] Naoyuki Kanda,et al. Elastic spectral distortion for low resource speech recognition with deep neural networks , 2013, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding.
[82] Thomas Niesler,et al. Feature Exploration for Almost Zero-Resource ASR-Free Keyword Spotting Using a Multilingual Bottleneck Extractor and Correspondence Autoencoders , 2018, INTERSPEECH.
[83] Quoc V. Le,et al. Listen, attend and spell: A neural network for large vocabulary conversational speech recognition , 2015, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[84] Navdeep Jaitly,et al. Vocal Tract Length Perturbation (VTLP) improves speech recognition , 2013 .
[85] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[86] Quoc V. Le,et al. SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition , 2019, INTERSPEECH.
[87] Brian Kan-Wing Mak,et al. Multitask Learning of Deep Neural Networks for Low-Resource Speech Recognition , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[88] Yu Tsao,et al. Voice Conversion from Unaligned Corpora Using Variational Autoencoding Wasserstein Generative Adversarial Networks , 2017, INTERSPEECH.
[89] Xu Wang,et al. A frequency warping approach for vocal tract length normalization , 2004, Proceedings 7th International Conference on Signal Processing, 2004. Proceedings. ICSP '04. 2004..
[90] Shrikanth Narayanan,et al. A system for the 2019 Sentiment, Emotion and Cognitive State Task of DARPA's LORELEI project , 2019, 2019 8th International Conference on Affective Computing and Intelligent Interaction (ACII).
[91] Max Welling,et al. Auto-Encoding Variational Bayes , 2013, ICLR.
[92] Andreas Stolcke,et al. MLLR transforms as features in speaker recognition , 2005, INTERSPEECH.
[93] Lukás Burget,et al. Analysis of Multilingual Blstm Acoustic Model on Low and High Resource Languages , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[94] Bin Ma,et al. Efficient methods to train multilingual bottleneck feature extractors for low resource keyword search , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[95] Shuang Xu,et al. A Comparison of Modeling Units in Sequence-to-Sequence Speech Recognition with the Transformer on Mandarin Chinese , 2018, ICONIP.
[96] Stephanie Strassel,et al. LORELEI Language Packs: Data, Tools, and Resources for Technology Development in Low Resource Languages , 2016, LREC.
[97] Aiying Zhang. 基于多语言语音数据选择的资源稀缺蒙语语音识别研究 (Research on Low-resource Mongolian Speech Recognition Based on Multilingual Speech Data Selection) , 2018, 计算机科学.
[98] Tanvina Patel,et al. TDNN-based Multilingual Speech Recognition System for Low Resource Indian Languages , 2018, INTERSPEECH.
[99] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[100] William Chan,et al. Deep convolutional neural networks for acoustic modeling in low resource languages , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[101] Mark J. F. Gales,et al. Data augmentation for low resource languages , 2014, INTERSPEECH.
[102] 拓海 杉山,et al. “Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks”の学習報告 , 2017 .
[103] Lawrence R. Rabiner,et al. A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.
[104] Jasha Droppo,et al. Multi-task learning in deep neural networks for improved phoneme recognition , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.