Disentangling Style Factors from Speaker Representations
暂无分享,去创建一个
[1] Julia Hirschberg. A Corpus-Based Approach to the Study of Speaking Style , 2000 .
[2] Yutaka Matsuo,et al. Expressive Speech Synthesis via Modeling Expressions with Variational Autoencoder , 2018, INTERSPEECH.
[3] Yuxuan Wang,et al. Predicting Expressive Speaking Style from Text in End-To-End Speech Synthesis , 2018, 2018 IEEE Spoken Language Technology Workshop (SLT).
[4] Jiahong Yuan,et al. Pitch accent prediction: effects of genre and speaker , 2005, INTERSPEECH.
[5] Hansjörg Mixdorff,et al. Towards Objective Measures for Comparing Speaking Styles , 2005 .
[6] Yuxuan Wang,et al. Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis , 2018, ICML.
[7] Francis Nolan,et al. The IViE Corpus , 2014 .
[8] Richard Hans Robert Hahnloser,et al. Digital selection and analogue amplification coexist in a cortex-inspired silicon circuit , 2000, Nature.
[9] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[10] Mitchell McLaren,et al. How to train your speaker embeddings extractor , 2018, Odyssey.
[11] Srikanth Ronanki,et al. Learning Interpretable Control Dimensions for Speech Synthesis by Using External Data , 2018, INTERSPEECH.
[12] Ryo Masumura,et al. Read and spontaneous speech classification based on variance of GMM supervectors , 2014, INTERSPEECH.
[13] Yuxuan Wang,et al. Uncovering Latent Style Factors for Expressive Speech Synthesis , 2017, ArXiv.
[14] James H. Elder,et al. Probabilistic Linear Discriminant Analysis for Inferences About Identity , 2007, 2007 IEEE 11th International Conference on Computer Vision.
[15] Joon Son Chung,et al. VoxCeleb: A Large-Scale Speaker Identification Dataset , 2017, INTERSPEECH.
[16] Themos Stafylakis,et al. How to Improve Your Speaker Embeddings Extractor in Generic Toolkits , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[17] Daniel Povey,et al. The Kaldi Speech Recognition Toolkit , 2011 .
[18] Ron Hoory,et al. Efficient Emotion Recognition from Speech Using Deep Learning on Spectrograms , 2017, INTERSPEECH.
[19] Andrew Zisserman,et al. Emotion Recognition in Speech using Cross-Modal Transfer in the Wild , 2018, ACM Multimedia.
[20] Sanjeev Khudanpur,et al. X-Vectors: Robust DNN Embeddings for Speaker Recognition , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[21] Haizhou Li,et al. An overview of text-independent speaker recognition: From features to supervectors , 2010, Speech Commun..
[22] Sanjeev Khudanpur,et al. Deep Neural Network Embeddings for Text-Independent Speaker Verification , 2017, INTERSPEECH.
[23] Andrew Rosenberg,et al. Symbolic and Direct Sequential Modeling of Prosody for Classification of Speaking-Style and Nativeness , 2011, INTERSPEECH.
[24] Joon Son Chung,et al. VoxCeleb2: Deep Speaker Recognition , 2018, INTERSPEECH.
[25] A. Ng. Feature selection, L1 vs. L2 regularization, and rotational invariance , 2004, Twenty-first international conference on Machine learning - ICML '04.
[26] Daniel Garcia-Romero,et al. Analysis of i-vector Length Normalization in Speaker Recognition Systems , 2011, INTERSPEECH.
[27] Igor Jauk,et al. Unsupervised Learning for Expressive Speech Synthesis , 2017, IberSPEECH.
[28] Koichi Shinoda,et al. Attentive Statistics Pooling for Deep Speaker Embedding , 2018, INTERSPEECH.
[29] Yuan Yu,et al. TensorFlow: A system for large-scale machine learning , 2016, OSDI.
[30] Carlos Busso,et al. IEMOCAP: interactive emotional dyadic motion capture database , 2008, Lang. Resour. Evaluation.
[31] Florin Curelaru,et al. Front-End Factor Analysis For Speaker Verification , 2018, 2018 International Conference on Communications (COMM).