暂无分享,去创建一个
[1] Masato Akagi,et al. Voice conversion for emotional speech: Rule-based synthesis with degree of emotion controllable in dimensional space , 2018, Speech Commun..
[2] Haizhou Li,et al. A Voice Conversion Framework with Tandem Feature Sparse Representation and Speaker-Adapted WaveNet Vocoder , 2018, INTERSPEECH.
[3] Haizhou Li,et al. Adaptive Wavenet Vocoder for Residual Compensation in GAN-Based Voice Conversion , 2018, 2018 IEEE Spoken Language Technology Workshop (SLT).
[4] Tetsuya Takiguchi,et al. Emotional voice conversion using neural networks with arbitrary scales F0 based on wavelet transform , 2017, EURASIP Journal on Audio, Speech, and Music Processing.
[5] Jo Yew Tham,et al. Attribute Manipulation Generative Adversarial Networks for Fashion Images , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[6] Haizhou Li,et al. Deep Bidirectional LSTM Modeling of Timbre and Prosody for Emotional Voice Conversion , 2016, INTERSPEECH.
[7] Chung-Hsien Wu,et al. Hierarchical Prosody Conversion Using Regression-Based Clustering for Emotional Speech Synthesis , 2010, IEEE Transactions on Audio, Speech, and Language Processing.
[8] Haizhou Li,et al. Exemplar-Based Sparse Representation With Residual Compensation for Voice Conversion , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[9] Hamidou Tembine,et al. Nonparallel Emotional Speech Conversion , 2018, INTERSPEECH.
[10] Tetsuya Takiguchi,et al. Exemplar-based emotional voice conversion using non-negative matrix factorization , 2014, Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2014 Asia-Pacific.
[11] Zhizheng Wu,et al. On the use of I-vectors and average voice model for voice conversion without parallel data , 2016, 2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA).
[12] Yoshua Bengio,et al. Generative Adversarial Nets , 2014, NIPS.
[13] Haizhou Li,et al. Fundamental frequency modeling using wavelets for emotional voice conversion , 2015, 2015 International Conference on Affective Computing and Intelligent Interaction (ACII).
[14] Aijun Li,et al. Prosody conversion from neutral speech to emotional speech , 2006, IEEE Transactions on Audio, Speech, and Language Processing.
[15] Moncef Gabbouj,et al. Voice Conversion Using Partial Least Squares Regression , 2010, IEEE Transactions on Audio, Speech, and Language Processing.
[16] Steve J. Young,et al. Data-driven emotion conversion in spoken English , 2009, Speech Commun..
[17] Tetsuya Takiguchi,et al. High-order sequence modeling using speaker-dependent recurrent temporal restricted boltzmann machines for voice conversion , 2014, INTERSPEECH.
[18] Esther Klabbers,et al. Decomposition of pitch curves in the general superpositional intonation model , 2006 .
[19] Kou Tanaka,et al. Cyclegan-VC2: Improved Cyclegan-based Non-parallel Voice Conversion , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[20] Yu Tsao,et al. Voice conversion from non-parallel corpora using variational auto-encoder , 2016, 2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA).
[21] Haizhou Li,et al. Sparse representation of phonetic features for voice conversion with and without parallel data , 2017, 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[22] Hirokazu Kameoka,et al. Parallel-Data-Free Voice Conversion Using Cycle-Consistent Adversarial Networks , 2017, ArXiv.
[23] Haizhou Li,et al. WaveTTS: Tacotron-based TTS with Joint Time-Frequency Domain Loss , 2020, ArXiv.
[24] K. Scherer,et al. Vocal cues in emotion encoding and decoding , 1991 .
[25] Tetsuya Takiguchi,et al. GMM-Based Emotional Voice Conversion Using Spectrum and Prosody Features , 2012 .
[26] Satoshi Nakamura,et al. Voice conversion through vector quantization , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.
[27] 拓海 杉山,et al. “Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks”の学習報告 , 2017 .
[28] Hao Wang,et al. Phonetic posteriorgrams for many-to-one voice conversion without parallel data training , 2016, 2016 IEEE International Conference on Multimedia and Expo (ICME).
[29] Tetsuya Takiguchi,et al. Emotional Voice Conversion Using Dual Supervised Adversarial Networks With Continuous Wavelet Transform F0 Features , 2019, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[30] Haizhou Li,et al. Wavelet Analysis of Speaker Dependent and Independent Prosody for Voice Conversion , 2018, INTERSPEECH.
[31] Satoshi Nakamura,et al. Speaker adaptation and voice conversion by codebook mapping , 1991, 1991., IEEE International Sympoisum on Circuits and Systems.
[32] Marc Schröder,et al. Emotional speech synthesis: a review , 2001, INTERSPEECH.
[33] Moncef Gabbouj,et al. Hierarchical modeling of F0 contours for voice conversion , 2014, INTERSPEECH.
[34] Tomoki Toda,et al. Voice Conversion Based on Maximum-Likelihood Estimation of Spectral Parameter Trajectory , 2007, IEEE Transactions on Audio, Speech, and Language Processing.
[35] Martti Vainio,et al. Continuous wavelet transform for analysis of speech prosody , 2013 .
[36] Hirokazu Kameoka,et al. CycleGAN-VC: Non-parallel Voice Conversion Using Cycle-Consistent Adversarial Networks , 2018, 2018 26th European Signal Processing Conference (EUSIPCO).
[37] Dimitris N. Metaxas,et al. StackGAN: Text to Photo-Realistic Image Synthesis with Stacked Generative Adversarial Networks , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).
[38] Kishore Prahallad,et al. Spectral Mapping Using Artificial Neural Networks for Voice Conversion , 2010, IEEE Transactions on Audio, Speech, and Language Processing.
[39] Haizhou Li,et al. SINGAN: Singing Voice Conversion with Generative Adversarial Networks , 2019, 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC).
[40] Tetsuya Takiguchi,et al. Emotional Voice Conversion with Adaptive Scales F0 Based on Wavelet Transform Using Limited Amount of Emotional Data , 2017, INTERSPEECH.
[41] Junichi Yamagishi,et al. Investigating different representations for modeling and controlling multiple emotions in DNN-based speech synthesis , 2018, Speech Commun..
[42] Haizhou Li,et al. Transformation of prosody in voice conversion , 2017, 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC).
[43] Tetsuya Takiguchi,et al. Emotional voice conversion using deep neural networks with MCC and F0 features , 2016, 2016 IEEE/ACIS 15th International Conference on Computer and Information Science (ICIS).
[44] Haizhou Li,et al. Exemplar-based sparse representation of timbre and prosody for voice conversion , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[45] Hao Wang,et al. Personalized, Cross-Lingual TTS Using Phonetic Posteriorgrams , 2016, INTERSPEECH.
[46] Haizhou Li,et al. Teacher-Student Training For Robust Tacotron-Based TTS , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[47] Lirong Dai,et al. Emotional statistical parametric speech synthesis using LSTM-RNNs , 2017, 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC).
[48] Yi Xu. SPEECH PROSODY : A METHODOLOGICAL REVIEW , 2011 .
[49] Michael Lenz,et al. Estimation of the parameters of the quantitative intonation model with continuous wavelet analysis , 2003, INTERSPEECH.
[50] Jo Yew Tham,et al. Semantically consistent text to fashion image synthesis with an enhanced attentional generative adversarial network , 2020, Pattern Recognit. Lett..
[51] Haizhou Li,et al. On the Study of Generative Adversarial Networks for Cross-Lingual Voice Conversion , 2019, 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[52] Nick Campbell,et al. A corpus-based speech synthesis system with emotion , 2003, Speech Commun..
[53] Haizhou Li,et al. Group Sparse Representation With WaveNet Vocoder Adaptation for Spectrum and Prosody Conversion , 2019, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[54] Yee Whye Teh,et al. A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.
[55] Axel Röbel,et al. Sequence-to-sequence Modelling of F0 for Speech Emotion Conversion , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[56] Li-Rong Dai,et al. Voice Conversion Using Deep Neural Networks With Layer-Wise Generative Training , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[57] Jo Yew Tham,et al. Semantically Consistent Hierarchical Text to Fashion Image Synthesis with an Enhanced-Attentional Generative Adversarial Network , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).
[58] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[59] R. Kubichek,et al. Mel-cepstral distance measure for objective speech quality assessment , 1993, Proceedings of IEEE Pacific Rim Conference on Communications Computers and Signal Processing.
[60] Chi-Keung Tang,et al. Attribute-Guided Face Generation Using Conditional CycleGAN , 2017, ECCV.
[61] Yu Tsao,et al. Voice Conversion from Unaligned Corpora Using Variational Autoencoding Wasserstein Generative Adversarial Networks , 2017, INTERSPEECH.
[62] Masanori Morise,et al. WORLD: A Vocoder-Based High-Quality Speech Synthesis System for Real-Time Applications , 2016, IEICE Trans. Inf. Syst..
[63] Haizhou Li,et al. Phonetically Aware Exemplar-Based Prosody Transformation , 2018, Odyssey.
[64] S. Ramakrishnan,et al. Speech Enhancement, Modeling And Recognition: Algorithms And Applications , 2014 .
[65] Paavo Alku,et al. Wavelets for intonation modeling in HMM speech synthesis , 2013, SSW.
[66] Chi-Keung Tang,et al. Conditional CycleGAN for Attribute Guided Face Image Generation , 2017, ArXiv.
[67] Haizhou Li,et al. Emotional facial expression transfer based on temporal restricted Boltzmann machines , 2014, Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2014 Asia-Pacific.
[68] Masami Akamine,et al. Multilevel parametric-base F0 model for speech synthesis , 2008, INTERSPEECH.