Optimization of Speaker-Aware Multichannel Speech Extraction with ASR Criterion
暂无分享,去创建一个
Tomohiro Nakatani | Marc Delcroix | Jan Cernocký | Keisuke Kinoshita | Takuya Higuchi | Katerina Zmolíková
[1] Reinhold Häb-Umbach,et al. On the Computation of Complex-valued Gradients with Application to Statistically Optimum Beamforming , 2017, ArXiv.
[2] Tomohiro Nakatani,et al. Learning speaker representation for neural network based multichannel speaker extraction , 2017, 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[3] Reinhold Häb-Umbach,et al. Beamnet: End-to-end training of a beamformer-supported multi-channel ASR system , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[4] Shinji Watanabe,et al. Sequence summarizing neural network for speaker adaptation , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[5] Daniel P. W. Ellis,et al. Model-Based Expectation-Maximization Source Separation and Localization , 2010, IEEE Transactions on Audio, Speech, and Language Processing.
[6] Reinhold Häb-Umbach,et al. Neural network based spectral mask estimation for acoustic beamforming , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[7] Zhuo Chen,et al. Deep clustering: Discriminative embeddings for segmentation and separation , 2015, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[8] Reinhold Häb-Umbach,et al. Tight Integration of Spatial and Spectral Features for BSS with Deep Clustering Embeddings , 2017, INTERSPEECH.
[9] Hiroshi Sawada,et al. Underdetermined Convolutive Blind Source Separation via Frequency Bin-Wise Clustering and Permutation Alignment , 2011, IEEE Transactions on Audio, Speech, and Language Processing.
[10] Liang Lu,et al. Deep beamforming networks for multi-channel speech recognition , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[11] Tomohiro Nakatani,et al. Speaker-Aware Neural Network Based Beamformer for Speaker Extraction in Speech Mixtures , 2017, INTERSPEECH.
[12] Jon Barker,et al. The third ‘CHiME’ speech separation and recognition challenge: Dataset, task and baselines , 2015, 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU).
[13] Richard M. Stern,et al. Likelihood-maximizing beamforming for robust hands-free speech recognition , 2004, IEEE Transactions on Speech and Audio Processing.
[14] Reinhold Häb-Umbach,et al. A generic neural acoustic beamforming architecture for robust multi-channel speech processing , 2017, Comput. Speech Lang..
[15] Dong Yu,et al. Single-Channel Multi-talker Speech Recognition with Permutation Invariant Training , 2017, Speech Commun..
[16] Dong Yu,et al. Deep Neural Networks for Single-Channel Multi-Talker Speech Recognition , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[17] Tara N. Sainath,et al. Multichannel Signal Processing With Deep Neural Networks for Automatic Speech Recognition , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[18] Jonathan Le Roux,et al. Phase-sensitive and recognition-boosted speech separation using deep recurrent neural networks , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[19] Scott Rickard,et al. Blind separation of speech mixtures via time-frequency masking , 2004, IEEE Transactions on Signal Processing.
[20] Jesper Jensen,et al. Permutation invariant training of deep models for speaker-independent multi-talker speech separation , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[21] Chng Eng Siong,et al. On time-frequency mask estimation for MVDR beamforming with application in robust speech recognition , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[22] Jonathan Le Roux,et al. Improved MVDR Beamforming Using Single-Channel Mask Prediction Networks , 2016, INTERSPEECH.
[23] Chengzhu Yu,et al. Context adaptive deep neural networks for fast acoustic model adaptation in noisy conditions , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[24] Jun Du,et al. On Design of Robust Deep Models for CHiME-4 Multi-Channel Speech Recognition with Multiple Configurations of Array Microphones , 2017, INTERSPEECH.
[25] Reinhold Häb-Umbach,et al. Optimizing neural-network supported acoustic beamforming by algorithmic differentiation , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[26] Reinhold Häb-Umbach,et al. Blind Acoustic Beamforming Based on Generalized Eigenvalue Decomposition , 2007, IEEE Transactions on Audio, Speech, and Language Processing.
[27] Tomohiro Nakatani,et al. Deep Clustering-Based Beamforming for Separation with Unknown Number of Sources , 2017, INTERSPEECH.
[28] Reinhold Häb-Umbach,et al. Blind speech separation employing directional statistics in an Expectation Maximization framework , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.
[29] Dong Yu,et al. Recognizing Multi-talker Speech with Permutation Invariant Training , 2017, INTERSPEECH.