The STC System for the CHiME-6 Challenge
暂无分享,去创建一个
Yuri Y. Khokhlov | I. Medennikov | M. Korenevsky | Tatiana Prisyach | Mariya Korenevskaya | Ivan Sorokin | Tatiana Timofeeva | A. Mitrofanov | A. Andrusenko | Ivan Podluzhny | A. Laptev | A. Romanenko | Anton Mitrofanov
[1] L. J. Griffiths,et al. An alternative approach to linearly constrained adaptive beamforming , 1982 .
[2] DeLiang Wang,et al. Time-Frequency Masking for Speech Separation and Its Potential for Hearing Aid Design , 2008 .
[3] Jacob Benesty,et al. On Optimal Frequency-Domain Multichannel Linear Filtering for Noise Reduction , 2010, IEEE Transactions on Audio, Speech, and Language Processing.
[4] Marco Matassoni,et al. An auditory based modulation spectral feature for reverberant speech recognition , 2010, INTERSPEECH.
[5] Haihua Xu,et al. Minimum Bayes Risk decoding and system combination based on a recursion for edit distance , 2011, Comput. Speech Lang..
[6] Tomohiro Nakatani,et al. Generalization of Multi-Channel Linear Prediction Methods for Blind MIMO Impulse Response Shortening , 2012, IEEE Transactions on Audio, Speech, and Language Processing.
[7] George Saon,et al. Speaker adaptation of neural network acoustic models using i-vectors , 2013, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding.
[8] Andrew W. Senior,et al. Long short-term memory recurrent neural network architectures for large scale acoustic modeling , 2014, INTERSPEECH.
[9] Martin Wolf,et al. Channel selection measures for multi-microphone speech recognition , 2014, Speech Commun..
[10] Yiming Wang,et al. Purely Sequence-Trained Neural Networks for ASR Based on Lattice-Free MMI , 2016, INTERSPEECH.
[11] Sanjeev Khudanpur,et al. Acoustic Modelling from the Signal Domain Using CNNs , 2016, INTERSPEECH.
[12] Tara N. Sainath,et al. Generation of Large-Scale Simulated Utterances in Virtual Rooms to Train Deep-Neural Networks for Far-Field Speech Recognition in Google Home , 2017, INTERSPEECH.
[13] Joon Son Chung,et al. VoxCeleb: A Large-Scale Speaker Identification Dataset , 2017, INTERSPEECH.
[14] Reinhold Haeb-Umbach,et al. Front-end processing for the CHiME-5 dinner party scenario , 2018, 5th International Workshop on Speech Processing in Everyday Environments (CHiME 2018).
[15] Hermann Ney,et al. The RWTH/UPB system combination for the CHiME 2018 Workshop , 2018, 5th International Workshop on Speech Processing in Everyday Environments (CHiME 2018).
[16] Sanjeev Khudanpur,et al. X-Vectors: Robust DNN Embeddings for Speaker Recognition , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[17] Jiqing Han,et al. Investigation of Monaural Front-End Processing for Robust ASR without Retraining or Joint-Training , 2018, ArXiv.
[18] Richard Socher,et al. Regularizing and Optimizing LSTM Language Models , 2017, ICLR.
[19] Sergey Novoselov,et al. On deep speaker embeddings for text-independent speaker recognition , 2018, Odyssey.
[20] Reinhold Haeb-Umbach,et al. NARA-WPE: A Python package for weighted prediction error dereverberation in Numpy and Tensorflow for online and offline processing , 2018, ITG Symposium on Speech Communication.
[21] Jon Barker,et al. The fifth 'CHiME' Speech Separation and Recognition Challenge: Dataset, task and baselines , 2018, INTERSPEECH.
[22] Nima Mesgarani,et al. Conv-TasNet: Surpassing Ideal Time–Frequency Magnitude Masking for Speech Separation , 2018, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[23] Kyu J. Han,et al. Multi-Stride Self-Attention for Speech Recognition , 2019, INTERSPEECH.
[24] Naoyuki Kanda,et al. Guided Source Separation Meets a Strong ASR Backend: Hitachi/Paderborn University Joint Investigation for Dinner Party ASR , 2019, INTERSPEECH.
[25] Naoyuki Kanda,et al. Auxiliary Interference Speaker Loss for Target-Speaker Speech Recognition , 2019, INTERSPEECH.
[26] Reinhold Häb-Umbach,et al. An Investigation into the Effectiveness of Enhancement in ASR Training and Test for Chime-5 Dinner Party Transcription , 2019, 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[27] Naoyuki Kanda,et al. Simultaneous Speech Recognition and Speaker Diarization for Monaural Dialogue Recordings with Target-Speaker Acoustic Models , 2019, 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[28] Quoc V. Le,et al. SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition , 2019, INTERSPEECH.
[29] Naoyuki Kanda,et al. End-to-End Neural Speaker Diarization with Self-Attention , 2019, 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[30] Kyu J. Han,et al. State-of-the-Art Speech Recognition Using Multi-Stream Self-Attention with Dilated 1D Convolutions , 2019, 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[31] Shrikanth Narayanan,et al. Auto-Tuning Spectral Clustering for Speaker Diarization Using Normalized Maximum Eigengap , 2020, IEEE Signal Processing Letters.
[32] End-to-end Microphone Permutation and Number Invariant Multi-channel Speech Separation , 2019, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[33] Aleksei Romanenko,et al. Target-Speaker Voice Activity Detection: a Novel Approach for Multi-Speaker Diarization in a Dinner Party Scenario , 2020, INTERSPEECH.
[34] Galina Lavrentyeva,et al. Deep Speaker Embeddings for Far-Field Speaker Recognition on Short Utterances , 2020, Odyssey.
[35] Ivan Provilkov,et al. BPE-Dropout: Simple and Effective Subword Regularization , 2019, ACL.
[36] Tomohiro Nakatani,et al. Improving Noise Robust Automatic Speech Recognition with Single-Channel Time-Domain Enhancement Network , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[37] Jon Barker,et al. CHiME-6 Challenge: Tackling Multispeaker Speech Recognition for Unsegmented Recordings , 2020, 6th International Workshop on Speech Processing in Everyday Environments (CHiME 2020).