Utterance-level Aggregation for Speaker Recognition in the Wild
暂无分享,去创建一个
Joon Son Chung | Andrew Zisserman | Weidi Xie | Arsha Nagrani | Andrew Zisserman | Arsha Nagrani | Weidi Xie
[1] Quan Wang,et al. Attention-Based Models for Text-Dependent Speaker Verification , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[2] Oliver Durr,et al. Speaker identification and clustering using convolutional neural networks , 2016, 2016 IEEE 26th International Workshop on Machine Learning for Signal Processing (MLSP).
[3] Koichi Shinoda,et al. Attentive Statistics Pooling for Deep Speaker Embedding , 2018, INTERSPEECH.
[4] Andrea Vedaldi,et al. MatConvNet: Convolutional Neural Networks for MATLAB , 2014, ACM Multimedia.
[5] Joon Son Chung,et al. VoxCeleb: A Large-Scale Speaker Identification Dataset , 2017, INTERSPEECH.
[6] Sanjeev Khudanpur,et al. X-Vectors: Robust DNN Embeddings for Speaker Recognition , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[7] Ming Li,et al. Exploring the Encoding Layer and Loss Function in End-to-End Speaker and Language Recognition System , 2018, Odyssey.
[8] Josef Sivic,et al. NetVLAD: CNN Architecture for Weakly Supervised Place Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[9] Xiao Liu,et al. Deep Speaker: an End-to-End Neural Speaker Embedding System , 2017, ArXiv.
[10] Huizhong Chen,et al. Residual Enhanced Visual Vectors for on-device image matching , 2011, 2011 Conference Record of the Forty Fifth Asilomar Conference on Signals, Systems and Computers (ASILOMAR).
[11] Dengxin Dai,et al. Unified Hypersphere Embedding for Speaker Recognition , 2018, ArXiv.
[12] Martín Abadi,et al. TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.
[13] Ming Li,et al. A Novel Learnable Dictionary Encoding Layer for End-to-End Language Identification , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[14] Ming Li,et al. Analysis of Length Normalization in End-to-End Speaker Verification System , 2018, INTERSPEECH.
[15] Joon Son Chung,et al. VoxCeleb2: Deep Speaker Recognition , 2018, INTERSPEECH.
[16] Patrick Kenny,et al. Deep Speaker Embeddings for Short-Duration Speaker Verification , 2017, INTERSPEECH.
[17] Ming Li,et al. End-to-end Language Identification using NetFV and NetVLAD , 2018, 2018 11th International Symposium on Chinese Spoken Language Processing (ISCSLP).
[18] Luca Antiga,et al. Automatic differentiation in PyTorch , 2017 .
[19] Quan Wang,et al. Generalized End-to-End Loss for Speaker Verification , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[20] Sanjeev Khudanpur,et al. Deep Neural Network Embeddings for Text-Independent Speaker Verification , 2017, INTERSPEECH.
[21] Jian Cheng,et al. Additive Margin Softmax for Face Verification , 2018, IEEE Signal Processing Letters.
[22] Joaquín González-Rodríguez,et al. Automatic language identification using deep neural networks , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[23] Hao Tang,et al. Frame-Level Speaker Embeddings for Text-Independent Speaker Recognition and Analysis of End-to-End Model , 2018, 2018 IEEE Spoken Language Technology Workshop (SLT).
[24] Andrew Zisserman,et al. GhostVLAD for set-based face recognition , 2018, ACCV.
[25] Aaron Lawson,et al. The Speakers in the Wild (SITW) Speaker Recognition Database , 2016, INTERSPEECH.