Unsupervised Learning for Expressive Speech Synthesis
暂无分享,去创建一个
[1] J. Pierrehumbert. The phonology and phonetics of English intonation , 1987 .
[2] Bo Pang,et al. Seeing Stars: Exploiting Class Relationships for Sentiment Categorization with Respect to Rating Scales , 2005, ACL.
[3] Oliver Watts,et al. Towards speaking style transplantation in speech synthesis , 2013, SSW.
[4] Joseph P. Olive,et al. Text-to-speech synthesis , 1995, AT&T Technical Journal.
[5] Simon King,et al. Towards minimum perceptual error training for DNN-based speech synthesis , 2015, INTERSPEECH.
[6] Antonio Bonafonte,et al. Ogmios: The UPC Text-to-Speech synthesis system for Spoken Translation , 2006 .
[7] Andrej Ljolje,et al. Synthesis of natural sounding pitch contours in isolated utterances using hidden Markov models , 1986, IEEE Trans. Acoust. Speech Signal Process..
[8] Takao Kobayashi,et al. Acoustic Modeling of Speaking Styles and Emotional Expressions in HMM-Based Speech Synthesis , 2005, IEICE Trans. Inf. Syst..
[9] Marc Schröder,et al. Emotional speech synthesis: a review , 2001, INTERSPEECH.
[10] Antonio Bonafonte,et al. Multi-output RNN-LSTM for multiple speaker speech synthesis with α-interpolation model , 2016, SSW.
[11] Sang Joon Kim,et al. A Mathematical Theory of Communication , 2006 .
[12] Takao Kobayashi,et al. A Style Adaptation Technique for Speech Synthesis Using HSMM and Suprasegmental Features , 2006, IEICE Trans. Inf. Syst..
[13] Kenneth Kuttler,et al. An Introduction To Linear Algebra , 2008 .
[14] Jeffrey Pennington,et al. GloVe: Global Vectors for Word Representation , 2014, EMNLP.
[15] J. Q. Stewart. An Electrical Analogue of the Vocal Organs , 1922, Nature.
[16] Björn W. Schuller,et al. The INTERSPEECH 2010 paralinguistic challenge , 2010, INTERSPEECH.
[17] Douglas A. Reynolds,et al. Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..
[18] Gemma Boleda,et al. Wikicorpus: A Word-Sense Disambiguated Multilingual Wikipedia Corpus , 2010, LREC.
[19] Simon King,et al. Deep neural networks employing Multi-Task Learning and stacked bottleneck features for speech synthesis , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[20] Mark J. F. Gales,et al. Unsupervised clustering of emotion and voice styles for expressive TTS , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[21] Antonio Bonafonte,et al. Direct Expressive Voice Training Based on Semantic Selection , 2016, INTERSPEECH.
[22] Mark Liberman,et al. The intonational system of English , 1979 .
[23] Tatyana V. Polyàkova,et al. Grapheme-to-Phoneme Conversion in the Era of Globalization , 2015 .
[24] Satoshi Imai,et al. Cepstral analysis synthesis on the mel frequency scale , 1983, ICASSP.
[25] Mark J. F. Gales,et al. Integrated Expression Prediction and Speech Synthesis From Text , 2014, IEEE Journal of Selected Topics in Signal Processing.
[26] Takashi Nose,et al. A Style Control Technique for HMM-Based Expressive Speech Synthesis , 2007, IEICE Trans. Inf. Syst..
[27] Heiga Zen,et al. Deep mixture density networks for acoustic modeling in statistical parametric speech synthesis , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[28] Bernd Möbius,et al. Ein quantitatives Modell der deutschen Intonation , 1993 .
[29] Marc Schröder,et al. Expressive Speech Synthesis: Past, Present, and Possible Futures , 2009, Affective Information Processing.
[30] B.-H. Juang,et al. On the hidden Markov model and dynamic time warping for speech recognition — A unified view , 1984, AT&T Bell Laboratories Technical Journal.
[31] Jeffrey Dean,et al. Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.
[32] Zhizheng Wu,et al. Investigating gated recurrent neural networks for speech synthesis , 2016, ArXiv.
[33] Florin Curelaru,et al. Front-End Factor Analysis For Speaker Verification , 2018, 2018 International Conference on Communications (COMM).
[34] Tamás Gábor Csapó,et al. Synthesizing expressive speech from amateur audiobook recordings , 2012, 2012 IEEE Spoken Language Technology Workshop (SLT).
[35] Paula Lopez-Otero,et al. iVectors for Continuous Emotion Recognition , 2014 .
[36] Mark A. Musen,et al. The protégé project: a look back and a look forward , 2015, SIGAI.
[37] Shankar Kumar,et al. Normalization of non-standard words , 2001, Comput. Speech Lang..
[38] Junichi Yamagishi,et al. Average-Voice-Based Speech Synthesis , 2006 .
[39] Ning Qian,et al. On the momentum term in gradient descent learning algorithms , 1999, Neural Networks.
[40] Jean Vroomen,et al. Duration and intonation in emotional speech , 1993, EUROSPEECH.
[41] Antonio Bonafonte,et al. Acoustic feature prediction from semantic features for expressive speech using deep neural networks , 2016, 2016 24th European Signal Processing Conference (EUSIPCO).
[42] Shigeo Morishima,et al. Emotion modeling in speech production using emotion space , 1996, Proceedings 5th IEEE International Workshop on Robot and Human Communication. RO-MAN'96 TSUKUBA.
[43] George Karypis,et al. Empirical and Theoretical Comparisons of Selected Criterion Functions for Document Clustering , 2004, Machine Learning.
[44] Jeffrey Dean,et al. Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.
[45] Biing-Hwang Juang,et al. Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.
[46] Takao Kobayashi,et al. Speech Synthesis with Various Emotional Expressions and Speaking Styles by Style Interpolation and Morphing , 2005, IEICE Trans. Inf. Syst..
[47] Hideki Kawahara,et al. Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds , 1999, Speech Commun..
[48] R. Bensraj,et al. An Efficient Sentence-based Sentiment Analysis for Expressive Text-to-speech using Fuzzy Neural Network , 2014 .
[49] Richard Wiese. Silbische und lexikalische Phonologie : Studien zum Chinesischen und Deutschen , 1988 .
[50] Matthew D. Zeiler. ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.
[51] W. Sendlmeier,et al. Verification of acoustical correlates of emotional speech using formant-synthesis , 2000 .
[52] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[53] Björn W. Schuller,et al. Recognizing Affect from Linguistic Information in 3D Continuous Space , 2011, IEEE Transactions on Affective Computing.
[54] Patrick Kenny,et al. Eigenvoice modeling with sparse training data , 2005, IEEE Transactions on Speech and Audio Processing.
[55] Nitish Srivastava,et al. Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..
[56] Peter Birkholz,et al. Articulatory Synthesis of Speech and Singing: State of the Art and Suggestions for Future Research , 2009, COST 2102 School.
[57] Heiga Zen,et al. WaveNet: A Generative Model for Raw Audio , 2016, SSW.
[58] J. Pennebaker,et al. The Secret Life of Pronouns , 2003, Psychological science.
[59] Kai Yu,et al. Cluster Adaptive Training for Deep Neural Network Based Acoustic Model , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[60] Daniel Povey,et al. The Kaldi Speech Recognition Toolkit , 2011 .
[61] Antonio Bonafonte,et al. Creating expressive synthetic voices by unsupervised clustering of audiobooks , 2015, INTERSPEECH.
[62] Jaime Lorenzo Trueba. Design and Evaluation of Statistical Parametric Techniques in Expressive Text-To-Speech: Emotion and Speaking Styles Transplantation , 2016 .
[63] Hai Zhao,et al. Word embedding for recurrent neural network based TTS synthesis , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[64] Simon King,et al. Analysis of statistical parametric and unit selection speech synthesis systems applied to emotional speech , 2010, Speech Commun..
[65] Xin Wang,et al. Investigation of Using Continuous Representation of Various Linguistic Units in Neural Network Based Text-to-Speech Synthesis , 2016, IEICE Trans. Inf. Syst..
[66] J. van den Berg. Myoelastic-aerodynamic theory of voice production. , 1958, Journal of speech and hearing research.
[67] Charlotte Wollermann. Prosodie, nonverbale Signale, Unsicherheit und Kontext: Studien zur pragmatischen Fokusinterpretation , 2013 .
[68] Antonio Bonafonte,et al. Automatic voice-source parameterization of natural speech , 2005, INTERSPEECH.
[69] Heiga Zen,et al. An HMM-based singing voice synthesis system , 2006, INTERSPEECH.
[70] Julie Carson-Berndsen,et al. Clustering Expressive Speech Styles in Audiobooks Using Glottal Source Parameters , 2011, INTERSPEECH.
[71] Mike Rozak. Text-to-speech Designed For a Massively Multiplayer Online Role-Playing Game (MMORPG) , 2007 .
[72] Junichi Yamagishi,et al. Expressive Speech Synthesis Using Sentiment Embeddings , 2018, INTERSPEECH.
[73] Takashi Nose,et al. HMM-Based Style Control for Expressive Speech Synthesis with Arbitrary Speaker's Voice Using Model Adaptation , 2009, IEICE Trans. Inf. Syst..
[74] Fasih Haider,et al. Prediction of Emotions from Text using Sentiment Analysis for Expressive Speech Synthesis , 2016, SSW.
[75] Tillman Weyde,et al. A Neural Probabilistic Model for Predicting Melodic Sequences , 2013 .
[76] Zhizheng Wu,et al. A study of speaker adaptation for DNN-based speech synthesis , 2015, INTERSPEECH.
[77] Shinji Takaki,et al. Constructing a Deep Neural Network Based Spectral Model for Statistical Speech Synthesis , 2016, Recent Advances in Nonlinear Speech Processing.
[78] Antonio Bonafonte,et al. Prosodic and Spectral iVectors for Expressive Speech Synthesis , 2016, SSW.
[79] R. Gray,et al. Vector quantization , 1984, IEEE ASSP Magazine.
[80] Björn W. Schuller,et al. Speaker independent emotion recognition by early fusion of acoustic and linguistic features within ensembles , 2005, INTERSPEECH.
[81] Frank K. Soong,et al. On the training aspects of Deep Neural Network (DNN) for parametric TTS synthesis , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[82] Iain R. Murray,et al. Toward the simulation of emotion in synthetic speech: a review of the literature on human vocal emotion. , 1993, The Journal of the Acoustical Society of America.
[83] Jan P. H. van Santen,et al. Assignment of segmental duration in text-to-speech synthesis , 1994, Comput. Speech Lang..
[84] Inma Hernáez,et al. Improved HNM-Based Vocoder for Statistical Synthesizers , 2011, INTERSPEECH.
[85] John L. Arnott,et al. Implementation and testing of a system for producing emotion-by-rule in synthetic speech , 1995, Speech Commun..
[86] 智基 戸田,et al. Recent developments of the HMM-based speech synthesis system (HTS) , 2007 .
[87] D H Klatt,et al. Review of text-to-speech conversion for English. , 1987, The Journal of the Acoustical Society of America.
[88] Takao Kobayashi,et al. Modeling of various speaking styles and emotions for HMM-based speech synthesis , 2003, INTERSPEECH.
[89] D. Klatt. Letter: Interaction between two factors that influence vowel duration. , 1973, The Journal of the Acoustical Society of America.
[90] Susan Fitt,et al. Robust LTS rules with the Combilex speech technology lexicon , 2009, INTERSPEECH.
[91] K. Stevens,et al. An Electrical Analog of the Vocal Tract , 1953 .
[92] Patrick Kenny,et al. Joint Factor Analysis of Speaker and Session Variability: Theory and Algorithms , 2006 .
[93] S. King,et al. Combining a vector space representation of linguistic context with a deep neural network for text-to-speech synthesis , 2013, SSW.
[94] Paul Boersma,et al. Praat: doing phonetics by computer , 2003 .
[95] Björn W. Schuller,et al. The INTERSPEECH 2009 emotion challenge , 2009, INTERSPEECH.
[96] Heiga Zen,et al. Statistical parametric speech synthesis using deep neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[97] Jordi Luque,et al. Jitter and Shimmer Measurements for Speaker Diarization , 2014 .
[98] Dan Klein,et al. Accurate Unlexicalized Parsing , 2003, ACL.
[99] E. B. Newman,et al. A Scale for the Measurement of the Psychological Magnitude Pitch , 1937 .
[100] Gerold Ungeheuer. Elemente einer Akustischen Theorie der Vokalartikulation , 1962 .
[101] Junichi Yamagishi,et al. Towards Cross-Lingual Emotion Transplantation , 2014, IberSPEECH.
[102] Christopher Potts,et al. Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank , 2013, EMNLP.
[103] Francesc Alías,et al. Sentence-Based Sentiment Analysis for Expressive Text-to-Speech , 2013, IEEE Transactions on Audio, Speech, and Language Processing.
[104] J. Montero,et al. ANALYSIS AND MODELLING OF EMOTIONAL SPEECH IN SPANISH , 1999 .
[105] Michael Picheny,et al. The IBM expressive speech synthesis system , 2004, INTERSPEECH.
[106] Paul Taylor,et al. A Phonetic Model of English Intonation , 1992 .
[107] Heiga Zen,et al. Unidirectional long short-term memory recurrent neural network with recurrent output layer for low-latency speech synthesis , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[108] Samy Bengio,et al. Tacotron: A Fully End-to-End Text-To-Speech Synthesis Model , 2017, ArXiv.
[109] Oliver Watts,et al. Unsupervised learning for text-to-speech synthesis , 2013 .
[110] Antonio Bonafonte,et al. Prosodic Break Prediction with RNNs , 2016, IberSPEECH.
[111] G. N. Lance,et al. Mixed-Data Classificatory Programs I - Agglomerative Systems , 1967, Aust. Comput. J..