State-of-the-art speaker recognition with neural network embeddings in NIST SRE18 and Speakers in the Wild evaluations
暂无分享,去创建一个
Alan McCree | Daniel Garcia-Romero | Gregory Sell | Nanxin Chen | Najim Dehak | Jesús Villalba | David Snyder | Pedro A. Torres-Carrasquillo | Fred Richardson | Leibny Paola García-Perera | Réda Dehak | Jonas Borgstrom | Jonas Borgstrom | D. Garcia-Romero | P. Torres-Carrasquillo | Gregory Sell | J. Villalba | Nanxin Chen | A. McCree | David Snyder | F. Richardson | Réda Dehak | N. Dehak | Fred Richardson
[1] Stefanos Zafeiriou,et al. ArcFace: Additive Angular Margin Loss for Deep Face Recognition , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[2] Luca Antiga,et al. Automatic differentiation in PyTorch , 2017 .
[3] Lukás Burget,et al. Discriminatively trained Probabilistic Linear Discriminant Analysis for speaker verification , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[4] Kyu J. Han,et al. Densely Connected Networks for Conversational Speech Recognition , 2018, INTERSPEECH.
[5] Sanjeev Khudanpur,et al. A time delay neural network architecture for efficient modeling of long temporal contexts , 2015, INTERSPEECH.
[6] Douglas A. Reynolds,et al. Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..
[7] Ming Li,et al. A Novel Learnable Dictionary Encoding Layer for End-to-End Language Identification , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[8] Yiming Wang,et al. Semi-Orthogonal Low-Rank Matrix Factorization for Deep Neural Networks , 2018, INTERSPEECH.
[9] Ying Zhang,et al. Towards End-to-End Speech Recognition with Deep Convolutional Neural Networks , 2016, INTERSPEECH.
[10] Joon Son Chung,et al. VoxCeleb2: Deep Speaker Recognition , 2018, INTERSPEECH.
[11] Eduardo Lleida,et al. The I3a speaker recognition system for NIST SRE12: post-evaluation analysis , 2013, INTERSPEECH.
[12] Niko Brümmer,et al. Unsupervised Domain Adaptation for I-Vector Speaker Recognition , 2014, Odyssey.
[13] Yun Lei,et al. A novel scheme for speaker recognition using a phonetically-aware deep neural network , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[14] Alvin F. Martin,et al. The NIST Speaker Recognition Evaluations: 1996-2001 , 1998, Odyssey.
[15] Eduardo Lleida,et al. Unsupervised adaptation of PLDA by using variational Bayes methods , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[16] Sri Harish Reddy Mallidi,et al. Neural Network Bottleneck Features for Language Identification , 2014, Odyssey.
[17] Sanjeev Khudanpur,et al. X-Vectors: Robust DNN Embeddings for Speaker Recognition , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[18] Eduardo Lleida,et al. Bayesian adaptation of PLDA based speaker recognition to domains with scarce development data , 2012, Odyssey.
[19] Chunlei Zhang,et al. End-to-End Text-Independent Speaker Verification with Triplet Loss on Short Utterances , 2017, INTERSPEECH.
[20] Patrick Kenny,et al. Joint Factor Analysis Versus Eigenchannels in Speaker Recognition , 2007, IEEE Transactions on Audio, Speech, and Language Processing.
[21] Daniel Povey,et al. The Kaldi Speech Recognition Toolkit , 2011 .
[22] Vincent M. Stanford,et al. The 2021 NIST Speaker Recognition Evaluation , 2022, Odyssey.
[23] Aaron Lawson,et al. The Speakers in the Wild (SITW) Speaker Recognition Database , 2016, INTERSPEECH.
[24] Aaron Lawson,et al. On the Issue of Calibration in DNN-Based Speaker Recognition Systems , 2016, INTERSPEECH.
[25] Niko Brümmer,et al. The BOSARIS Toolkit: Theory, Algorithms and Code for Surviving the New DCF , 2013, ArXiv.
[26] Lukás Burget,et al. Fast variational Bayes for heavy-tailed PLDA applied to i-vectors and x-vectors , 2018, INTERSPEECH.
[27] Geoffrey E. Hinton,et al. Phoneme recognition using time-delay neural networks , 1989, IEEE Trans. Acoust. Speech Signal Process..
[28] Douglas A. Reynolds,et al. The NIST speaker recognition evaluation - Overview, methodology, systems, results, perspective , 2000, Speech Commun..
[29] Kilian Q. Weinberger,et al. Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[30] Sanjeev Khudanpur,et al. Parallel training of DNNs with Natural Gradient and Parameter Averaging , 2014 .
[31] Sanjeev Khudanpur,et al. Spoken Language Recognition using X-vectors , 2018, Odyssey.
[32] Douglas A. Reynolds,et al. Approaches and applications of audio diarization , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..
[33] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[34] Bhiksha Raj,et al. SphereFace: Deep Hypersphere Embedding for Face Recognition , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[35] Ming Li,et al. Exploring the Encoding Layer and Loss Function in End-to-End Speaker and Language Recognition System , 2018, Odyssey.
[36] Douglas A. Reynolds,et al. The 2018 NIST Speaker Recognition Evaluation , 2019, INTERSPEECH.
[37] Niko Brümmer,et al. Towards Fully Bayesian Speaker Recognition: Integrating Out the Between-Speaker Covariance , 2011, INTERSPEECH.
[38] Yifan Gong,et al. End-to-End attention based text-dependent speaker verification , 2016, 2016 IEEE Spoken Language Technology Workshop (SLT).
[39] Alvin F. Martin,et al. NIST Speaker Recognition Evaluations Utilizing the Mixer Corpora—2004, 2005, 2006 , 2007, IEEE Transactions on Audio, Speech, and Language Processing.
[40] Shinji Watanabe,et al. Diarization is Hard: Some Experiences and Lessons Learned for the JHU Team in the Inaugural DIHARD Challenge , 2018, INTERSPEECH.
[41] Patrick Kenny,et al. Front-End Factor Analysis for Speaker Verification , 2011, IEEE Transactions on Audio, Speech, and Language Processing.
[42] Pietro Laface,et al. Pairwise Discriminative Speaker Verification in the ${\rm I}$-Vector Space , 2011, IEEE Transactions on Audio, Speech, and Language Processing.
[43] Niko Brümmer,et al. The speaker partitioning problem , 2010, Odyssey.
[44] Sanjeev Khudanpur,et al. Speaker Recognition for Multi-speaker Conversations Using X-vectors , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[45] Aleksandr Sizov,et al. Unifying Probabilistic Linear Discriminant Analysis Variants in Biometric Authentication , 2014, S+SSPR.
[46] Vincent M. Stanford,et al. Performance factor analysis for the 2012 NIST speaker recognition evaluation , 2014, INTERSPEECH.
[47] Pietro Laface,et al. Gender independent discriminative speaker recognition in i-vector space , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[48] Joon Son Chung,et al. VoxCeleb: A Large-Scale Speaker Identification Dataset , 2017, INTERSPEECH.
[49] Sanjeev Khudanpur,et al. Deep neural network-based speaker embeddings for end-to-end speaker verification , 2016, 2016 IEEE Spoken Language Technology Workshop (SLT).
[50] Alan McCree,et al. Speaker diarization using deep neural network embeddings , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[51] John J. Godfrey,et al. SWITCHBOARD: telephone speech corpus for research and development , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.
[52] Sanjeev Khudanpur,et al. Deep Neural Network Embeddings for Text-Independent Speaker Verification , 2017, INTERSPEECH.
[53] Quan Wang,et al. Attention-Based Models for Text-Dependent Speaker Verification , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[54] Daniel Garcia-Romero,et al. Analysis of i-vector Length Normalization in Speaker Recognition Systems , 2011, INTERSPEECH.
[55] Georg Heigold,et al. End-to-end text-dependent speaker verification , 2015, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[56] Patrick Kenny,et al. Bayesian Speaker Verification with Heavy-Tailed Priors , 2010, Odyssey.
[57] Douglas A. Reynolds,et al. Deep Neural Network Approaches to Speaker and Language Recognition , 2015, IEEE Signal Processing Letters.
[58] Niko Brümmer,et al. End-to-End versus Embedding Neural Networks for Language Recognition in Mismatched Conditions , 2018, Odyssey.
[59] Alan McCree,et al. Language Recognition for Telephone and Video Speech: The JHU HLTCOE Submission for NIST LRE17 , 2018, Odyssey.
[60] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[61] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[62] Erik McDermott,et al. Deep neural networks for small footprint text-dependent speaker verification , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).