Speech2Phone: A Novel and Efficient Method for Training Speaker Recognition Models
暂无分享,去创建一个
Moacir Antonelli Ponti | Sandra Maria Aluísio | Edresson Casanova | Christopher Shulby | Arnaldo Cândido Júnior | Lucas Rafael Stefanel Gris | Arnaldo Candido Junior | Frederico Santos de Oliveira | Hamilton Pereira da Silva | S. Aluísio | M. Ponti | Hamilton Pereira da Silva | Edresson Casanova | C. Shulby | L. Gris | F. S. Oliveira
[1] Jennifer Chu-Carroll,et al. Building Watson: An Overview of the DeepQA Project , 2010, AI Mag..
[2] Joon Son Chung,et al. Clova Baseline System for the VoxCeleb Speaker Recognition Challenge 2020 , 2020, ArXiv.
[3] Hsiao-Chuan Wang,et al. A method of estimating the equal error rate for automatic speaker verification , 2004, 2004 International Symposium on Chinese Spoken Language Processing.
[4] Frank Rudzicz,et al. Centroid-based Deep Metric Learning for Speaker Recognition , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[5] Sercan Ömer Arik,et al. Neural Voice Cloning with a Few Samples , 2018, NeurIPS.
[6] Yann LeCun,et al. Learning a similarity metric discriminatively, with application to face verification , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).
[7] Joon Son Chung,et al. In defence of metric learning for speaker recognition , 2020, INTERSPEECH.
[8] Beth Logan,et al. Mel Frequency Cepstral Coefficients for Music Modeling , 2000, ISMIR.
[9] Haizhou Li,et al. Cross-lingual Voice Conversion with Bilingual Phonetic Posteriorgram and Average Modeling , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[10] Francis M. Tyers,et al. Common Voice: A Massively-Multilingual Speech Corpus , 2020, LREC.
[11] Arno Sprecher,et al. An Artificial Intelligence Approach , 1994 .
[12] H. B. Kekre,et al. Closed set and open set Speaker Identification using amplitude distribution of different Transforms , 2013, 2013 International Conference on Advances in Technology and Engineering (ICATE).
[13] Colin Raffel,et al. librosa: Audio and Music Signal Analysis in Python , 2015, SciPy.
[14] Sanjeev Khudanpur,et al. Deep Neural Network Embeddings for Text-Independent Speaker Verification , 2017, INTERSPEECH.
[15] Alan R. Jones,et al. Fast Fourier Transform , 1970, SIGP.
[16] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[17] Guigang Zhang,et al. Deep Learning , 2016, Int. J. Semantic Comput..
[18] Figen Ertaş,et al. FUNDAMENTALS OF SPEAKER RECOGNITION , 2011 .
[19] Junichi Yamagishi,et al. SUPERSEDED - CSTR VCTK Corpus: English Multi-speaker Corpus for CSTR Voice Cloning Toolkit , 2016 .
[20] Gabriel B. Paranhos da Costa,et al. Deep Convolutional Neural Networks and Noisy Images , 2017, CIARP.
[21] Yuan Tang,et al. TF.Learn: TensorFlow's High-level Module for Distributed Machine Learning , 2016, ArXiv.
[22] Chenhao Tan,et al. Machine Learning , 1983, Symbolic Computation.
[23] Sergey Ioffe,et al. Probabilistic Linear Discriminant Analysis , 2006, ECCV.
[24] Bhiksha Raj,et al. SphereFace: Deep Hypersphere Embedding for Face Recognition , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[25] Xin Wang,et al. Zero-Shot Multi-Speaker Text-To-Speech with State-Of-The-Art Neural Speaker Embeddings , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[26] Stefanos Zafeiriou,et al. ArcFace: Additive Angular Margin Loss for Deep Face Recognition , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[27] Sanjeev Khudanpur,et al. X-Vectors: Robust DNN Embeddings for Speaker Recognition , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[28] Jian Cheng,et al. Additive Margin Softmax for Face Verification , 2018, IEEE Signal Processing Letters.
[29] Quan Wang,et al. Generalized End-to-End Loss for Speaker Verification , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[30] Hervé Bredin,et al. TristouNet: Triplet loss for speaker turn embedding , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[31] Joon Son Chung,et al. VoxCeleb: A Large-Scale Speaker Identification Dataset , 2017, INTERSPEECH.
[32] James Philbin,et al. FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[33] Sriram Ganapathy,et al. Pairwise Discriminative Neural PLDA for Speaker Verification , 2020, ArXiv.
[34] Sercan Ömer Arik,et al. Deep Voice 3: Scaling Text-to-Speech with Convolutional Sequence Learning , 2017, ICLR.