Voxceleb: Large-scale speaker verification in the wild
暂无分享,去创建一个
Joon Son Chung | Andrew Zisserman | Weidi Xie | Arsha Nagrani | Andrew Zisserman | Arsha Nagrani | Weidi Xie
[1] Joon Son Chung,et al. Lip Reading in the Wild , 2016, ACCV.
[2] Jean Carletta,et al. The AMI meeting corpus , 2005 .
[3] David A. van Leeuwen,et al. NFI-FRITS: A forensic speaker recognition database and some first experiments , 2014, Odyssey.
[4] Quan Wang,et al. Generalized End-to-End Loss for Speaker Verification , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[5] Joon Son Chung,et al. Out of Time: Automated Lip Sync in the Wild , 2016, ACCV Workshops.
[6] Ira Kemelmacher-Shlizerman,et al. The MegaFace Benchmark: 1 Million Faces for Recognition at Scale , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[7] Richard C. Rose,et al. Deep bottleneck features for i-vector based text-independent speaker verification , 2015, 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU).
[8] Andrew Zisserman,et al. Taking the bite out of automated naming of characters in TV video , 2009, Image Vis. Comput..
[9] Andrew Zisserman,et al. Return of the Devil in the Details: Delving Deep into Convolutional Nets , 2014, BMVC.
[10] Josef Sivic,et al. NetVLAD: CNN Architecture for Weakly Supervised Place Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[11] The NIST Year 2012 Speaker Recognition Evaluation Plan 1 I , 2022 .
[12] John H. L. Hansen,et al. I4U Submission to NIST SRE 2018: Leveraging from a Decade of Shared Experiences , 2019, INTERSPEECH.
[13] Dengxin Dai,et al. Unified Hypersphere Embedding for Speaker Recognition , 2018, ArXiv.
[14] Josephine Sullivan,et al. One millisecond face alignment with an ensemble of regression trees , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.
[15] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[16] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.
[17] Hao Tang,et al. Frame-Level Speaker Embeddings for Text-Independent Speaker Recognition and Analysis of End-to-End Model , 2018, 2018 IEEE Spoken Language Technology Workshop (SLT).
[18] Ming Li,et al. Analysis of Length Normalization in End-to-End Speaker Verification System , 2018, INTERSPEECH.
[19] Andrew Zisserman,et al. Deep Face Recognition , 2015, BMVC.
[20] Yann LeCun,et al. Dimensionality Reduction by Learning an Invariant Mapping , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).
[21] Sergey Ioffe,et al. Probabilistic Linear Discriminant Analysis , 2006, ECCV.
[22] Aaron Lawson,et al. The Speakers in the Wild (SITW) Speaker Recognition Database , 2016, INTERSPEECH.
[23] Wei Liu,et al. SSD: Single Shot MultiBox Detector , 2015, ECCV.
[24] Razvan Pascanu,et al. A simple neural network module for relational reasoning , 2017, NIPS.
[25] John H. L. Hansen,et al. Speaker Recognition by Machines and Humans: A tutorial review , 2015, IEEE Signal Processing Magazine.
[26] Sanjeev Khudanpur,et al. X-Vectors: Robust DNN Embeddings for Speaker Recognition , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[27] Douglas A. Reynolds. Speaker and language recognition: a guided safari , 2008, Odyssey.
[28] Davis E. King,et al. Dlib-ml: A Machine Learning Toolkit , 2009, J. Mach. Learn. Res..
[29] Joon Son Chung,et al. VoxCeleb2: Deep Speaker Recognition , 2018, INTERSPEECH.
[30] Andrew Zisserman,et al. GhostVLAD for set-based face recognition , 2018, ACCV.
[31] Huizhong Chen,et al. Residual Enhanced Visual Vectors for on-device image matching , 2011, 2011 Conference Record of the Forty Fifth Asilomar Conference on Signals, Systems and Computers (ASILOMAR).
[32] Lara Lynn Stoll,et al. Finding Difficult Speakers in Automatic Speaker Recognition , 2011 .
[33] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[34] Lars Kai Hansen,et al. A New Database for Speaker Recognition , 2005 .
[35] Joon Son Chung,et al. Learning to lip read words by watching videos , 2018, Comput. Vis. Image Underst..
[36] Andrea Vedaldi,et al. MatConvNet: Convolutional Neural Networks for MATLAB , 2014, ACM Multimedia.
[37] Andreas Stolcke,et al. The ICSI Meeting Corpus , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..
[38] Joon Son Chung,et al. You said that? , 2017, BMVC.
[39] Joon Son Chung,et al. Utterance-level Aggregation for Speaker Recognition in the Wild , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[40] Quan Wang,et al. Attention-Based Models for Text-Dependent Speaker Verification , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[41] Jaakko Lehtinen,et al. Audio-driven facial animation by joint end-to-end learning of pose and emotion , 2017, ACM Trans. Graph..
[42] John H. L. Hansen,et al. Robust speech recognition in noise: an evaluation using the SPINE corpus , 2001, INTERSPEECH.
[43] Joon Son Chung,et al. VoxCeleb: A Large-Scale Speaker Identification Dataset , 2017, INTERSPEECH.
[44] Li Shen,et al. Comparator Networks , 2018, ECCV.
[45] Erik McDermott,et al. Deep neural networks for small footprint text-dependent speaker verification , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[46] Tinne Tuytelaars,et al. Cross-Modal Supervision for Learning Active Speaker Detection in Video , 2016, ECCV.
[47] Sadaoki Furui,et al. AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING PROCEEDINGS , 1997 .
[48] Sébastien Marcel,et al. MOBIO Database for the ICPR 2010 Face and Speech Competition , 2009 .
[49] Xiang Yu,et al. Deep Metric Learning via Lifted Structured Feature Embedding , 2016 .
[50] Bhiksha Raj,et al. SphereFace: Deep Hypersphere Embedding for Face Recognition , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[51] Mark J. F. Gales,et al. The MGB challenge: Evaluating multi-genre broadcast media recognition , 2015, 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU).
[52] Joon Son Chung,et al. The Conversation: Deep Audio-Visual Speech Enhancement , 2018, INTERSPEECH.
[53] Patrick Kenny,et al. Deep Speaker Embeddings for Short-Duration Speaker Verification , 2017, INTERSPEECH.
[54] Florin Curelaru,et al. Front-End Factor Analysis For Speaker Verification , 2018, 2018 International Conference on Communications (COMM).
[55] Andrew Zisserman,et al. Learnable PINs: Cross-Modal Embeddings for Person Identity , 2018, ECCV.
[56] Douglas A. Reynolds,et al. Robust text-independent speaker identification using Gaussian mixture speaker models , 1995, IEEE Trans. Speech Audio Process..
[57] Omkar M. Parkhi,et al. VGGFace2: A Dataset for Recognising Faces across Pose and Age , 2017, 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018).
[58] Kevin Wilson,et al. Looking to listen at the cocktail party , 2018, ACM Trans. Graph..
[59] Joon Son Chung,et al. Lip Reading in Profile , 2017, BMVC.
[60] Alex Park,et al. The MIT Mobile Device Speaker Verification Corpus: Data Collection and Preliminary Experiments , 2006, 2006 IEEE Odyssey - The Speaker and Language Recognition Workshop.
[61] Yun Lei,et al. A novel scheme for speaker recognition using a phonetically-aware deep neural network , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[62] Dominique Genoud,et al. POLYCOST: A telephone-speech database for speaker recognition , 2000, Speech Commun..
[63] Xiao Liu,et al. Deep Speaker: an End-to-End Neural Speaker Embedding System , 2017, ArXiv.
[64] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[65] J.B. Millar,et al. The Australian National Database of Spoken Language , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.
[66] Douglas A. Reynolds,et al. Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..
[67] Georg Heigold,et al. End-to-end text-dependent speaker verification , 2015, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[68] John H. L. Hansen,et al. High performance digit recognition in real car environments , 2002, INTERSPEECH.
[69] Andrew Zisserman,et al. Seeing Voices and Hearing Faces: Cross-Modal Biometric Matching , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[70] John J. Godfrey,et al. SWITCHBOARD: telephone speech corpus for research and development , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.
[71] Sanjeev Khudanpur,et al. Deep Neural Network Embeddings for Text-Independent Speaker Verification , 2017, INTERSPEECH.
[72] Yuxiao Hu,et al. MS-Celeb-1M: A Dataset and Benchmark for Large-Scale Face Recognition , 2016, ECCV.
[73] Yann LeCun,et al. Learning a similarity metric discriminatively, with application to face verification , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).
[74] Jian Cheng,et al. Additive Margin Softmax for Face Verification , 2018, IEEE Signal Processing Letters.