SpeakerBeam: Speaker Aware Neural Network for Target Speaker Extraction in Speech Mixtures
暂无分享,去创建一个
Tomohiro Nakatani | Marc Delcroix | Keisuke Kinoshita | Tsubasa Ochiai | Lukáš Burget | Jan Černocký | Kateřina Žmolíková | L. Burget | T. Nakatani | K. Kinoshita | Kateřina Žmolíková | Marc Delcroix | Tsubasa Ochiai | J. Černocký
[1] Sree Hari Krishnan Parthasarathi,et al. Anchored Speech Detection , 2016, INTERSPEECH.
[2] Shinji Watanabe,et al. Sequence summarizing neural network for speaker adaptation , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[3] Tomohiro Nakatani,et al. Single Channel Target Speaker Extraction and Recognition with Speaker Beam , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[4] Dong Yu,et al. Multitalker Speech Separation With Utterance-Level Permutation Invariant Training of Deep Recurrent Neural Networks , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[5] DeLiang Wang,et al. On Training Targets for Supervised Speech Separation , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[6] Zhong-Qiu Wang,et al. Multi-Channel Deep Clustering: Discriminative Spectral and Spatial Embeddings for Speaker-Independent Speech Separation , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[7] DeLiang Wang,et al. Supervised Speech Separation Based on Deep Learning: An Overview , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[8] John R. Hershey,et al. Super-human multi-talker speech recognition: A graphical modeling approach , 2010, Comput. Speech Lang..
[9] Yoshua Bengio,et al. Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.
[10] Florin Curelaru,et al. Front-End Factor Analysis For Speaker Verification , 2018, 2018 International Conference on Communications (COMM).
[11] Geoffrey E. Hinton,et al. Visualizing Data using t-SNE , 2008 .
[12] Douglas A. Reynolds,et al. Diarization of Telephone Conversations Using Factor Analysis , 2010, IEEE Journal of Selected Topics in Signal Processing.
[13] Tomohiro Nakatani,et al. Context adaptive deep neural networks for fast acoustic model adaptation , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[14] Reinhold Häb-Umbach,et al. Deep Attractor Networks for Speaker Re-Identification and Blind Source Separation , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[15] Steve Renals,et al. Learning hidden unit contributions for unsupervised speaker adaptation of neural network acoustic models , 2014, 2014 IEEE Spoken Language Technology Workshop (SLT).
[16] Quan Wang,et al. Generalized End-to-End Loss for Speaker Verification , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[17] Daniel Povey,et al. The Kaldi Speech Recognition Toolkit , 2011 .
[18] George Saon,et al. Speaker adaptation of neural network acoustic models using i-vectors , 2013, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding.
[19] Ning Ma,et al. The PASCAL CHiME speech separation and recognition challenge , 2013, Comput. Speech Lang..
[20] H. Sebastian Seung,et al. Algorithms for Non-negative Matrix Factorization , 2000, NIPS.
[21] Lukás Burget,et al. iVector-based discriminative adaptation for automatic speech recognition , 2011, 2011 IEEE Workshop on Automatic Speech Recognition & Understanding.
[22] DeLiang Wang,et al. A Deep Ensemble Learning Method for Monaural Speech Separation , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[23] Jonathan Le Roux,et al. Deep clustering and conventional networks for music separation: Stronger together , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[24] Jun Du,et al. A Regression Approach to Single-Channel Speech Separation Via High-Resolution Deep Neural Networks , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[25] John R. Hershey,et al. VoiceFilter: Targeted Voice Separation by Speaker-Conditioned Spectrogram Masking , 2018, INTERSPEECH.
[26] Zhuo Chen,et al. Deep clustering: Discriminative embeddings for segmentation and separation , 2015, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[27] Nima Mesgarani,et al. Deep attractor network for single-microphone speaker separation , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[28] Jon Barker,et al. The third ‘CHiME’ speech separation and recognition challenge: Dataset, task and baselines , 2015, 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU).
[29] Sree Hari Krishnan Parthasarathi,et al. Robust Speech Recognition via Anchor Word Representations , 2017, INTERSPEECH.
[30] Tomohiro Nakatani,et al. The reverb challenge: A common evaluation framework for dereverberation and recognition of reverberant speech , 2013, 2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.
[31] Sri Harish Reddy Mallidi,et al. Device-directed Utterance Detection , 2018, INTERSPEECH.
[32] Tomohiro Nakatani,et al. Optimization of Speaker-Aware Multichannel Speech Extraction with ASR Criterion , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[33] Tuomas Virtanen,et al. Monaural Sound Source Separation by Nonnegative Matrix Factorization With Temporal Continuity and Sparseness Criteria , 2007, IEEE Transactions on Audio, Speech, and Language Processing.
[34] Tomohiro Nakatani,et al. Compact Network for Speakerbeam Target Speaker Extraction , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[35] Tomohiro Nakatani,et al. Learning speaker representation for neural network based multichannel speaker extraction , 2017, 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[36] Lukás Burget,et al. Simplification and optimization of i-vector extraction , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[37] Jun Wang,et al. Deep Extractor Network for Target Speaker Recovery From Single Channel Speech Mixtures , 2018, INTERSPEECH.
[38] Jesper Jensen,et al. Permutation invariant training of deep models for speaker-independent multi-talker speech separation , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[39] Bhiksha Raj,et al. Microphone Array Processing for Distant Speech Recognition: From Close-Talking Microphones to Far-Field Sensors , 2012, IEEE Signal Processing Magazine.
[40] Guy J. Brown,et al. Computational auditory scene analysis , 1994, Comput. Speech Lang..
[41] Jonathan Le Roux,et al. Improved MVDR Beamforming Using Single-Channel Mask Prediction Networks , 2016, INTERSPEECH.
[42] Reinhold Häb-Umbach,et al. Optimizing neural-network supported acoustic beamforming by algorithmic differentiation , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[43] Khe Chai Sim,et al. Subspace LHUC for Fast Adaptation of Deep Neural Network Acoustic Models , 2016, INTERSPEECH.
[44] Jun Du,et al. Speech separation of a target speaker based on deep neural networks , 2014, 2014 12th International Conference on Signal Processing (ICSP).
[45] Daniel P. W. Ellis,et al. MIR_EVAL: A Transparent Implementation of Common MIR Metrics , 2014, ISMIR.
[46] Dong Yu,et al. Past review, current progress, and challenges ahead on the cocktail party problem , 2018, Frontiers of Information Technology & Electronic Engineering.
[47] Rémi Gribonval,et al. Performance measurement in blind audio source separation , 2006, IEEE Transactions on Audio, Speech, and Language Processing.
[48] Daniel Patrick Whittlesey Ellis,et al. Prediction-driven computational auditory scene analysis , 1996 .
[49] Reinhold Häb-Umbach,et al. Blind Acoustic Beamforming Based on Generalized Eigenvalue Decomposition , 2007, IEEE Transactions on Audio, Speech, and Language Processing.
[50] Jonathan Le Roux,et al. Single-Channel Multi-Speaker Separation Using Deep Clustering , 2016, INTERSPEECH.
[51] John R. Hershey,et al. Single-Channel Multitalker Speech Recognition , 2010, IEEE Signal Processing Magazine.
[52] Tomohiro Nakatani,et al. Context Adaptive Neural Network Based Acoustic Models for Rapid Adaptation , 2018, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[53] Reinhold Häb-Umbach,et al. Neural network based spectral mask estimation for acoustic beamforming , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[54] Tomohiro Nakatani,et al. Speaker-Aware Neural Network Based Beamformer for Speaker Extraction in Speech Mixtures , 2017, INTERSPEECH.
[55] Andrew W. Senior,et al. Improving DNN speaker independence with I-vector inputs , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[56] Jont B. Allen,et al. Image method for efficiently simulating small‐room acoustics , 1976 .