End-to-end Monaural Multi-speaker ASR System without Pretraining
暂无分享,去创建一个
Shinji Watanabe | Kai Yu | Yanmin Qian | Xuankai Chang | Y. Qian | Shinji Watanabe | Kai Yu | Xuankai Chang
[1] Jonathan Le Roux,et al. Single-Channel Multi-Speaker Separation Using Deep Clustering , 2016, INTERSPEECH.
[2] Zhuo Chen,et al. Deep clustering: Discriminative embeddings for segmentation and separation , 2015, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[3] Quoc V. Le,et al. Listen, attend and spell: A neural network for large vocabulary conversational speech recognition , 2015, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[4] John R. Hershey,et al. Monaural speech separation and recognition challenge , 2010, Comput. Speech Lang..
[5] Shinji Watanabe,et al. ESPnet: End-to-End Speech Processing Toolkit , 2018, INTERSPEECH.
[6] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[7] Tara N. Sainath,et al. Deep convolutional neural networks for LVCSR , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[8] Dong Yu,et al. Knowledge Transfer in Permutation Invariant Training for Single-Channel Multi-Talker Speech Recognition , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[9] Kai Yu,et al. Phone Synchronous Speech Recognition With CTC Lattices , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[10] Dong Yu,et al. Adaptive Permutation Invariant Training with Auxiliary Information for Monaural Multi-Talker Speech Recognition , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[11] Jean Carletta,et al. The AMI Meeting Corpus: A Pre-announcement , 2005, MLMI.
[12] Daniel Povey,et al. The Kaldi Speech Recognition Toolkit , 2011 .
[13] Tara N. Sainath,et al. Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups , 2012, IEEE Signal Processing Magazine.
[14] Jasha Droppo,et al. Sequence Modeling in Unsupervised Single-Channel Overlapped Speech Recognition , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[15] John R. Hershey,et al. Hybrid CTC/Attention Architecture for End-to-End Speech Recognition , 2017, IEEE Journal of Selected Topics in Signal Processing.
[16] Yoshua Bengio,et al. End-to-end Continuous Speech Recognition using Attention-based Recurrent NN: First Results , 2014, ArXiv.
[17] Shinji Watanabe,et al. Joint CTC-attention based end-to-end speech recognition using multi-task learning , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[18] Jesper Jensen,et al. Permutation invariant training of deep models for speaker-independent multi-talker speech separation , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[19] Hermann Ney,et al. Improved training of end-to-end attention models for speech recognition , 2018, INTERSPEECH.
[20] Dong Yu,et al. Recognizing Multi-talker Speech with Permutation Invariant Training , 2017, INTERSPEECH.
[21] Dong Yu,et al. Monaural Multi-Talker Speech Recognition with Attention Mechanism and Gated Convolutional Networks , 2018, INTERSPEECH.
[22] Yajie Miao,et al. EESEN: End-to-end speech recognition using deep RNN models and WFST-based decoding , 2015, 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU).
[23] Navdeep Jaitly,et al. Towards End-To-End Speech Recognition with Recurrent Neural Networks , 2014, ICML.
[24] Jonathan Le Roux,et al. A Purely End-to-End System for Multi-speaker Speech Recognition , 2018, ACL.
[25] John R. Hershey,et al. Joint CTC/attention decoding for end-to-end speech recognition , 2017, ACL.
[26] Dong Yu,et al. Past review, current progress, and challenges ahead on the cocktail party problem , 2018, Frontiers of Information Technology & Electronic Engineering.
[27] Jon Barker,et al. The fifth 'CHiME' Speech Separation and Recognition Challenge: Dataset, task and baselines , 2018, INTERSPEECH.
[28] Samy Bengio,et al. Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks , 2015, NIPS.
[29] Jonathan Le Roux,et al. End-to-End Multi-Speaker Speech Recognition , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[30] Dong Yu,et al. Single-Channel Multi-talker Speech Recognition with Permutation Invariant Training , 2017, Speech Commun..
[31] Geoffrey Zweig,et al. The microsoft 2016 conversational speech recognition system , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[32] Jon Barker,et al. The second ‘chime’ speech separation and recognition challenge: Datasets, tasks and baselines , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[33] Dong Yu,et al. Multitalker Speech Separation With Utterance-Level Permutation Invariant Training of Deep Recurrent Neural Networks , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[34] Jinyu Li,et al. Progressive Joint Modeling in Unsupervised Single-Channel Overlapped Speech Recognition , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.