An iterative mask estimation approach to deep learning based multi-channel speech recognition
暂无分享,去创建一个
Jun Du | Chin-Hui Lee | Lei Sun | Jingdong Chen | Feng Ma | Hai-Kun Wang | Yanhui Tu | Chin-Hui Lee | Jingdong Chen | Jun Du | Yanhui Tu | Lei Sun | Feng Ma | Haikun Wang
[1] Emmanuel Vincent,et al. Multichannel Audio Source Separation With Deep Neural Networks , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[2] Walter Kellermann,et al. A generalization of blind source separation algorithms for convolutive mixtures based on second-order statistics , 2005, IEEE Transactions on Speech and Audio Processing.
[3] Jun Du,et al. An information fusion framework with multi-channel feature concatenation and multi-perspective system combination for the deep-learning-based robust recognition of microphone array speech , 2017, Comput. Speech Lang..
[4] Mark J. F. Gales,et al. Maximum likelihood linear transformations for HMM-based speech recognition , 1998, Comput. Speech Lang..
[5] Marc Moonen,et al. Spatially pre-processed speech distortion weighted multi-channel Wiener filtering for noise reduction , 2003, Signal Process..
[6] Jun Du,et al. Joint training of front-end and back-end deep neural networks for robust speech recognition , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[7] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[8] Reinhold Häb-Umbach,et al. BLSTM supported GEV beamformer front-end for the 3RD CHiME challenge , 2015, 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU).
[9] A.V. Oppenheim,et al. Enhancement and bandwidth compression of noisy speech , 1979, Proceedings of the IEEE.
[10] Chin-Hui Lee,et al. Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains , 1994, IEEE Trans. Speech Audio Process..
[11] Christian Jutten,et al. Blind separation of sources, part I: An adaptive algorithm based on neuromimetic architecture , 1991, Signal Process..
[12] Tara N. Sainath,et al. Multichannel Signal Processing With Deep Neural Networks for Automatic Speech Recognition , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[13] Hermann Ney,et al. Improved backing-off for M-gram language modeling , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.
[14] Reinhold Häb-Umbach,et al. Speech Enhancement With a GSC-Like Structure Employing Eigenvector-Based Transfer Function Ratios Estimation , 2011, IEEE Transactions on Audio, Speech, and Language Processing.
[15] Chng Eng Siong,et al. Speech enhancement using beamforming and non negative matrix factorization for robust speech recognition in the CHiME-3 challenge , 2015, 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU).
[16] Chng Eng Siong,et al. On time-frequency mask estimation for MVDR beamforming with application in robust speech recognition , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[17] B.D. Van Veen,et al. Beamforming: a versatile approach to spatial filtering , 1988, IEEE ASSP Magazine.
[18] Hermann Ney,et al. The RWTH/UPB/FORTH System Combination for the 4th CHiME Challenge Evaluation , 2016 .
[19] Jon Rigelsford,et al. Handbook of Neural Networks for Speech Processing , 2003 .
[20] Jörg Meyer,et al. Multi-channel speech enhancement in a car environment using Wiener filtering and spectral subtraction , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.
[21] Tim Brookes,et al. On the Ideal Ratio Mask as the Goal of Computational Auditory Scene Analysis , 2014 .
[22] Jun Du,et al. Joint training of DNNs by incorporating an explicit dereverberation structure for distant speech recognition , 2016, EURASIP J. Adv. Signal Process..
[23] Lukás Burget,et al. Recurrent neural network based language model , 2010, INTERSPEECH.
[24] Thomas F. Quatieri,et al. Speech analysis/Synthesis based on a sinusoidal representation , 1986, IEEE Trans. Acoust. Speech Signal Process..
[25] Jun Du,et al. Online LSTM-based Iterative Mask Estimation for Multi-Channel Speech Enhancement and ASR , 2018, 2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC).
[26] Xavier Anguera Miró,et al. Acoustic Beamforming for Speaker Diarization of Meetings , 2007, IEEE Transactions on Audio, Speech, and Language Processing.
[27] Jacob Benesty,et al. A Study of the LCMV and MVDR Noise Reduction Filters , 2010, IEEE Transactions on Signal Processing.
[28] Jun Du,et al. Speech Separation based on signal-noise-dependent deep neural networks for robust speech recognition , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[29] Tara N. Sainath,et al. Deep Neural Networks for Acoustic Modeling in Speech Recognition , 2012 .
[30] Lukás Burget,et al. Sequence-discriminative training of deep neural networks , 2013, INTERSPEECH.
[31] Jun Du,et al. On Design of Robust Deep Models for CHiME-4 Multi-Channel Speech Recognition with Multiple Configurations of Array Microphones , 2017, INTERSPEECH.
[32] J. Capon. High-resolution frequency-wavenumber spectrum analysis , 1969 .
[33] Geoffrey E. Hinton,et al. Acoustic Modeling Using Deep Belief Networks , 2012, IEEE Transactions on Audio, Speech, and Language Processing.
[34] Jon Barker,et al. The third 'CHiME' speech separation and recognition challenge: Analysis and outcomes , 2017, Comput. Speech Lang..
[35] Jon Barker,et al. An analysis of environment, microphone and data simulation mismatches in robust speech recognition , 2017, Comput. Speech Lang..
[36] Björn W. Schuller,et al. Speech Enhancement with LSTM Recurrent Neural Networks and its Application to Noise-Robust ASR , 2015, LVA/ICA.
[37] Israel Cohen,et al. Convolutive Transfer Function Generalized Sidelobe Canceler , 2009, IEEE Transactions on Audio, Speech, and Language Processing.
[38] Jun Du,et al. An information fusion approach to recognizing microphone array speech in the CHiME-3 challenge based on a deep learning framework , 2015, 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU).
[39] X. Mestre,et al. On diagonal loading for minimum variance beamformers , 2003, Proceedings of the 3rd IEEE International Symposium on Signal Processing and Information Technology (IEEE Cat. No.03EX795).
[40] P. Stoica,et al. Robust Adaptive Beamforming , 2013 .
[41] Zhengyou Zhang,et al. Maximum Likelihood Sound Source Localization and Beamforming for Directional Microphone Arrays in Distributed Meetings , 2008, IEEE Transactions on Multimedia.
[42] John R. Hershey,et al. Unified Architecture for Multichannel End-to-End Speech Recognition With Neural Beamforming , 2017, IEEE Journal of Selected Topics in Signal Processing.
[43] Wonyong Sung,et al. A statistical model-based voice activity detection , 1999, IEEE Signal Processing Letters.
[44] Bo Ren,et al. Robust speech recognition using beamforming with adaptive microphone gains and multichannel noise reduction , 2015, 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU).
[45] O. Hoshuyama,et al. A robust adaptive beamformer for microphone arrays with a blocking matrix using constrained adaptive filters , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.
[46] Israel Cohen,et al. Speech enhancement for non-stationary noise environments , 2001, Signal Process..
[47] Amr El-Keyi,et al. Robust adaptive beamforming based on the Kalman filter , 2005, IEEE Transactions on Signal Processing.
[48] Ephraim. Speech enhancement using a minimum mean square error short-time spectral amplitude estimator , 1984 .
[49] Jun Du,et al. A speech enhancement approach using piecewise linear approximation of an explicit model of environmental distortions , 2008, INTERSPEECH.
[50] Douglas L. Jones,et al. Blind location and separation of callers in a natural chorus using a microphone array. , 2009, The Journal of the Acoustical Society of America.
[51] Tomohiro Nakatani,et al. Integrating DNN-based and spatial clustering-based mask estimation for robust MVDR beamforming , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[52] Tara N. Sainath,et al. Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups , 2012, IEEE Signal Processing Magazine.
[53] Takuya Yoshioka,et al. Robust MVDR beamforming using time-frequency masks for online/offline ASR in noise , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[54] Heping Ding,et al. A Region-Growing Permutation Alignment Approach in Frequency-Domain Blind Source Separation of Speech Mixtures , 2011, IEEE Transactions on Audio, Speech, and Language Processing.
[55] Cong Liu,et al. The USTC-iFlytek System for CHiME-4 Challenge , 2016 .
[56] Rémi Gribonval,et al. Oracle estimators for the benchmarking of source separation algorithms , 2007, Signal Process..
[57] Reinhold Häb-Umbach,et al. Beamnet: End-to-end training of a beamformer-supported multi-channel ASR system , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[58] Jun Du,et al. A Regression Approach to Single-Channel Speech Separation Via High-Resolution Deep Neural Networks , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[59] Dirk Van Compernolle,et al. A family of MLP based nonlinear spectral estimators for noise reduction , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.
[60] DeLiang Wang,et al. Towards Scaling Up Classification-Based Speech Separation , 2013, IEEE Transactions on Audio, Speech, and Language Processing.
[61] S. Boll,et al. Suppression of acoustic noise in speech using spectral subtraction , 1979 .
[62] Chengzhu Yu,et al. The NTT CHiME-3 system: Advances in speech enhancement and recognition for mobile multi-microphone devices , 2015, 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU).