Atss-Net: Target Speaker Separation via Attention-based Neural Network
暂无分享,去创建一个
Ming Li | Qingjian Lin | Tingle Li | Yuanyuan Bao | Ming Li | Yuanyuan Bao | Tingle Li | Qingjian Lin
[1] Fabian-Robert Stöter,et al. MUSDB18 - a corpus for music separation , 2017 .
[2] Jonathan Le Roux,et al. Single-Channel Multi-Speaker Separation Using Deep Clustering , 2016, INTERSPEECH.
[3] Balaji Thoshkahna,et al. Voice and accompaniment separation in music using self-attention convolutional neural network , 2020, ArXiv.
[4] Sanjeev Khudanpur,et al. Librispeech: An ASR corpus based on public domain audio books , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[5] Andries P. Hekstra,et al. Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).
[6] Antoine Liutkus,et al. Scalable audio separation with light Kernel Additive Modelling , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[7] Lei Zhang,et al. Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[8] Loïc Le Folgoc,et al. Attention U-Net: Learning Where to Look for the Pancreas , 2018, ArXiv.
[9] Nima Mesgarani,et al. TaSNet: Time-Domain Audio Separation Network for Real-Time, Single-Channel Speech Separation , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[10] John R. Hershey,et al. VoiceFilter: Targeted Voice Separation by Speaker-Conditioned Spectrogram Masking , 2018, INTERSPEECH.
[11] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.
[12] Jesper Jensen,et al. Permutation invariant training of deep models for speaker-independent multi-talker speech separation , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[13] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[14] Yifan Gong,et al. Robust automatic speech recognition : a bridge to practical application , 2015 .
[15] Vladlen Koltun,et al. Multi-Scale Context Aggregation by Dilated Convolutions , 2015, ICLR.
[16] Zhuo Chen,et al. Deep clustering: Discriminative embeddings for segmentation and separation , 2015, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[17] Nima Mesgarani,et al. Deep attractor network for single-microphone speaker separation , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[18] Nima Mesgarani,et al. Conv-TasNet: Surpassing Ideal Time–Frequency Magnitude Masking for Speech Separation , 2018, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[19] Natalia Gimelshein,et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.
[20] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[21] Shinji Watanabe,et al. Diarization is Hard: Some Experiences and Lessons Learned for the JHU Team in the Inaugural DIHARD Challenge , 2018, INTERSPEECH.
[22] Takuya Yoshioka,et al. Dual-Path RNN: Efficient Long Sequence Modeling for Time-Domain Single-Channel Speech Separation , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[23] E. C. Cmm,et al. on the Recognition of Speech, with , 2008 .
[24] Ming Li,et al. On-the-Fly Data Loader and Utterance-Level Aggregation for Speaker and Language Recognition , 2020, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[25] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[26] Hui Bu,et al. AISHELL-2: Transforming Mandarin ASR Research Into Industrial Scale , 2018, ArXiv.
[27] Rémi Gribonval,et al. Performance measurement in blind audio source separation , 2006, IEEE Transactions on Audio, Speech, and Language Processing.
[28] Haizhou Li,et al. Target Speaker Extraction for Overlapped Multi-Talker Speaker Verification , 2019, ArXiv.
[29] Daniel Garcia-Romero,et al. Analysis of i-vector Length Normalization in Speaker Recognition Systems , 2011, INTERSPEECH.
[30] Douglas L. Jones,et al. A Study of Learning Based Beamforming Methods for Speech Recognition , 2016 .
[31] Rujie Liu,et al. LaFurca: Iterative Multi-Stage Refined End-to-End Monaural Speech Separation Based on Context-Aware Dual-Path Deep Parallel Inter-Intra Bi-LSTM , 2020 .
[32] Joon Son Chung,et al. VoxCeleb: A Large-Scale Speaker Identification Dataset , 2017, INTERSPEECH.
[33] Florian Metze,et al. New Era for Robust Speech Recognition , 2017, Springer International Publishing.
[34] Nima Mesgarani,et al. Real-time Single-channel Dereverberation and Separation with Time-domain Audio Separation Network , 2018, INTERSPEECH.
[35] Abien Fred Agarap. Deep Learning using Rectified Linear Units (ReLU) , 2018, ArXiv.
[36] Rujie Liu,et al. La Furca: Iterative Context-Aware End-to-End Monaural Speech Separation Based on Dual-Path Deep Parallel Inter-Intra Bi-LSTM with Attention , 2020, ArXiv.