VoxCeleb: A Large-Scale Speaker Identification Dataset
暂无分享,去创建一个
[1] Jonathan G. Fiscus,et al. DARPA TIMIT:: acoustic-phonetic continuous speech corpus CD-ROM, NIST speech disc 1-1.1 , 1993 .
[2] Jean Carletta,et al. The AMI meeting corpus , 2005 .
[3] Andrew Zisserman,et al. Taking the bite out of automated naming of characters in TV video , 2009, Image Vis. Comput..
[4] Andrew Zisserman,et al. Return of the Devil in the Details: Delving Deep into Convolutional Nets , 2014, BMVC.
[5] Aren Jansen,et al. CNN architectures for large-scale audio classification , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[6] John H. L. Hansen,et al. High performance digit recognition in real car environments , 2002, INTERSPEECH.
[7] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.
[8] Aaron Lawson,et al. The Speakers in the Wild (SITW) Speaker Recognition Database , 2016, INTERSPEECH.
[9] Joon Son Chung,et al. Lip Reading in the Wild , 2016, ACCV.
[10] Sébastien Marcel,et al. MOBIO Database for the ICPR 2010 Face and Speech Competition , 2009 .
[11] Mark J. F. Gales,et al. The MGB challenge: Evaluating multi-genre broadcast media recognition , 2015, 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU).
[12] The NIST Year 2012 Speaker Recognition Evaluation Plan 1 I , 2022 .
[13] Alex Park,et al. The MIT Mobile Device Speaker Verification Corpus: Data Collection and Preliminary Experiments , 2006, 2006 IEEE Odyssey - The Speaker and Language Recognition Workshop.
[14] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[15] Patrick Kenny,et al. Front-End Factor Analysis for Speaker Verification , 2011, IEEE Transactions on Audio, Speech, and Language Processing.
[16] Tara N. Sainath,et al. Learning the speech front-end with raw waveform CLDNNs , 2015, INTERSPEECH.
[17] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[18] Patrick Kenny,et al. Joint Factor Analysis of Speaker and Session Variability: Theory and Algorithms , 2006 .
[19] Oliver Durr,et al. Speaker identification and clustering using convolutional neural networks , 2016, 2016 IEEE 26th International Workshop on Machine Learning for Signal Processing (MLSP).
[20] John H. L. Hansen,et al. Robust speech recognition in noise: an evaluation using the SPINE corpus , 2001, INTERSPEECH.
[21] Dominique Genoud,et al. POLYCOST: A telephone-speech database for speaker recognition , 2000, Speech Commun..
[22] Lara Lynn Stoll,et al. Finding Difficult Speakers in Automatic Speaker Recognition , 2011 .
[23] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.
[24] David A. van Leeuwen,et al. NFI-FRITS: A forensic speaker recognition database and some first experiments , 2014, Odyssey.
[25] Joon Son Chung,et al. Out of Time: Automated Lip Sync in the Wild , 2016, ACCV Workshops.
[26] Andrew Zisserman,et al. Deep Face Recognition , 2015, BMVC.
[27] Davis E. King,et al. Dlib-ml: A Machine Learning Toolkit , 2009, J. Mach. Learn. Res..
[28] Andrea Vedaldi,et al. MatConvNet: Convolutional Neural Networks for MATLAB , 2014, ACM Multimedia.
[29] Andreas Stolcke,et al. The ICSI Meeting Corpus , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..
[30] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[31] J.B. Millar,et al. The Australian National Database of Spoken Language , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.
[32] Douglas A. Reynolds,et al. Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..
[33] Tinne Tuytelaars,et al. Cross-Modal Supervision for Learning Active Speaker Detection in Video , 2016, ECCV.
[34] Josephine Sullivan,et al. One millisecond face alignment with an ensemble of regression trees , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.
[35] Sergey Ioffe,et al. Probabilistic Linear Discriminant Analysis , 2006, ECCV.
[36] John J. Godfrey,et al. SWITCHBOARD: telephone speech corpus for research and development , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.
[37] Yann LeCun,et al. Learning a similarity metric discriminatively, with application to face verification , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).
[38] Lars Kai Hansen,et al. A New Database for Speaker Recognition , 2005 .
[39] Douglas A. Reynolds,et al. Robust text-independent speaker identification using Gaussian mixture speaker models , 1995, IEEE Trans. Speech Audio Process..