Emotional Voice Conversion Using a Hybrid Framework With Speaker-Adaptive DNN and Particle-Swarm-Optimized Neural Network
暂无分享,去创建一个
Deepa Gupta | Mohammed Zakariah | Yousef Ajami Alotaibi | Susmitha Vekkot | Y. Alotaibi | Deepa Gupta | Susmitha Vekkot | Mohammed Zakariah
[1] Douglas D. O'Shaughnessy,et al. Investigating Speech Enhancement and Perceptual Quality for Speech Emotion Recognition , 2018, INTERSPEECH.
[2] Antony William Rix,et al. Perceptual evaluation of speech quality (PESQ): The new ITU standard for end-to-end speech quality a , 2002 .
[3] Russell C. Eberhart,et al. A new optimizer using particle swarm theory , 1995, MHS'95. Proceedings of the Sixth International Symposium on Micro Machine and Human Science.
[4] Shikha Tripathi,et al. Vocal Emotion Conversion Using WSOLA and Linear Prediction , 2017, SPECOM.
[5] Martti Vainio,et al. Continuous wavelet transform for analysis of speech prosody , 2013 .
[6] Shikha Tripathi,et al. Significance of Glottal Closure Instants Detection Algorithms in Vocal Emotion Conversion , 2016, SOFA.
[7] I A Basheer,et al. Artificial neural networks: fundamentals, computing, design, and application. , 2000, Journal of microbiological methods.
[8] Alan W. Black,et al. Unit selection in a concatenative speech synthesis system using a large speech database , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.
[9] K. Sreenivasa Rao,et al. Designing prosody rule-set for converting neutral TTS speech to storytelling style speech for Indian languages: Bengali, Hindi and Telugu , 2014, 2014 Seventh International Conference on Contemporary Computing (IC3).
[10] Dongxiao Niu,et al. Research on Neural Networks Based on Culture Particle Swarm Optimization and Its Application in Power Load Forecasting , 2007, Third International Conference on Natural Computation (ICNC 2007).
[11] Z.A. Bashir,et al. Applying Wavelets to Short-Term Load Forecasting Using PSO-Based Neural Networks , 2009, IEEE Transactions on Power Systems.
[12] Susmitha Vekkot,et al. Prosodic transformation in vocal emotion conversion for multi-lingual scenarios: a pilot study , 2019, International Journal of Speech Technology.
[13] Chung-Hsien Wu,et al. Hierarchical Prosody Conversion Using Regression-Based Clustering for Emotional Speech Synthesis , 2010, IEEE Transactions on Audio, Speech, and Language Processing.
[14] Geoffrey E. Hinton,et al. Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.
[15] Susmitha Vekkot. Building a generalized model for multi-lingual vocal emotion conversion , 2017, 2017 Seventh International Conference on Affective Computing and Intelligent Interaction (ACII).
[16] Hadas Benisty,et al. Voice Conversion Using GMM with Enhanced Global Variance , 2011, INTERSPEECH.
[17] S. R. Mahadeva Prasanna,et al. Expressive speech synthesis: a review , 2013, Int. J. Speech Technol..
[18] S. R. Mahadeva Prasanna,et al. Neutral to Target Emotion Conversion Using Source and Suprasegmental Information , 2011, INTERSPEECH.
[19] Lauri Juvela,et al. Vocal Effort Based Speaking Style Conversion Using Vocoder Features and Parallel Learning , 2019, IEEE Access.
[20] K. Sreenivasa Rao,et al. Prosodic Mapping Using Neural Networks for Emotion Conversion in Hindi Language , 2016, Circuits Syst. Signal Process..
[21] I. Daubechies,et al. Synchrosqueezed wavelet transforms: An empirical mode decomposition-like tool , 2011 .
[22] K. Sreenivasa Rao,et al. Conversion of neutral speech to storytelling style speech , 2015, 2015 Eighth International Conference on Advances in Pattern Recognition (ICAPR).
[23] K. P. Soman,et al. Improved Epoch Extraction From Speech Signals Using Wavelet Synchrosqueezed Transform , 2019, 2019 National Conference on Communications (NCC).
[24] Tara N. Sainath,et al. Improving deep neural networks for LVCSR using rectified linear units and dropout , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[25] Philip J. B. Jackson,et al. Speaker-dependent audio-visual emotion recognition , 2009, AVSP.
[26] Masato Akagi,et al. Voice conversion to emotional speech based on three-layered model in dimensional approach and parameterization of dynamic features in prosody , 2016, 2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA).
[27] K. Sreenivasa Rao,et al. Non-uniform time scale modification using instants of significant excitation and vowel onset points , 2013, Speech Commun..
[28] Shikha Tripathi,et al. Inter-Emotion Conversion using Dynamic Time Warping and Prosody Imposition , 2016 .
[29] Robert A. J. Clark,et al. A multi-level representation of f0 using the continuous wavelet transform and the Discrete Cosine Transform , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[30] Deepa Gupta,et al. Emotion Conversion in Telugu using Constrained Variance GMM and Continuous Wavelet Transform-$F_{0}$ , 2019, TENCON 2019 - 2019 IEEE Region 10 Conference (TENCON).
[31] Tomoki Toda,et al. Speaking-aid systems using GMM-based voice conversion for electrolaryngeal speech , 2012, Speech Commun..
[32] Hong Huo,et al. Application of Synchrosqueezed Wavelet Transforms for Extraction of the Oscillatory Parameters of Subsynchronous Oscillation in Power Systems , 2018, Energies.
[33] Masato Akagi,et al. Toward affective speech-to-speech translation: Strategy for emotional speech recognition and synthesis in multiple languages , 2014, Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2014 Asia-Pacific.
[34] Shikha Tripathi,et al. Enhanced speech emotion detection using deep neural networks , 2018, International Journal of Speech Technology.
[35] Astrid Paeschke,et al. A database of German emotional speech , 2005, INTERSPEECH.
[36] Aijun Li,et al. Prosody conversion from neutral speech to emotional speech , 2006, IEEE Transactions on Audio, Speech, and Language Processing.
[37] Moncef Gabbouj,et al. Voice Conversion Using Partial Least Squares Regression , 2010, IEEE Transactions on Audio, Speech, and Language Processing.
[38] Eric Moulines,et al. Continuous probabilistic transform for voice conversion , 1998, IEEE Trans. Speech Audio Process..
[39] A. Roli. Artificial Neural Networks , 2012, Lecture Notes in Computer Science.
[40] Jiri Pribil,et al. GMM-Based Evaluation of Emotional Style Transformation in Czech and Slovak , 2014, Cognitive Computation.
[41] Soo-Young Lee,et al. Emotional End-to-End Neural Speech Synthesizer , 2017, NIPS 2017.
[42] K. Sreenivasa Rao,et al. Modification of energy spectra, epoch parameters and prosody for emotion conversion in speech , 2017, Int. J. Speech Technol..
[43] Paavo Alku,et al. Wavelets for intonation modeling in HMM speech synthesis , 2013, SSW.
[44] Shashidhar G. Koolagudi,et al. IITKGP-SESC: Speech Database for Emotion Analysis , 2009, IC3.
[45] Junichi Yamagishi,et al. Emotion transplantation through adaptation in HMM-based speech synthesis , 2015, Comput. Speech Lang..
[46] Riccardo Poli,et al. Particle swarm optimization , 1995, Swarm Intelligence.
[47] Elias Azarov,et al. Instantaneous pitch estimation based on RAPT framework , 2012, 2012 Proceedings of the 20th European Signal Processing Conference (EUSIPCO).
[48] Hamidou Tembine,et al. Nonparallel Emotional Speech Conversion , 2018, INTERSPEECH.
[49] Haizhou Li,et al. Voice conversion versus speaker verification: an overview , 2014 .
[50] Shashidhar G. Koolagudi,et al. Neural network based feature transformation for emotion independent speaker identification , 2012, Int. J. Speech Technol..
[51] Ingrid Daubechies,et al. A Nonlinear Squeezing of the Continuous Wavelet Transform Based on Auditory Nerve Models , 2017 .
[52] Anil Kumar Vuppala,et al. Prosody modification for speech recognition in emotionally mismatched conditions , 2018, Int. J. Speech Technol..
[53] Tetsuya Takiguchi,et al. Emotional Voice Conversion with Adaptive Scales F0 Based on Wavelet Transform Using Limited Amount of Emotional Data , 2017, INTERSPEECH.
[54] Vidhyasaharan Sethu,et al. Empirical mode decomposition based weighted frequency feature for speech-based emotion classification , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.
[55] Lauri Juvela,et al. Deep neural network based trainable voice source model for synthesis of speech with varying vocal effort , 2014, INTERSPEECH.
[56] Tetsuya Takiguchi,et al. GMM-Based Emotional Voice Conversion Using Spectrum and Prosody Features , 2012 .
[57] Dirk Heylen,et al. Generating expressive speech for storytelling applications , 2006, IEEE Transactions on Audio, Speech, and Language Processing.
[58] Tetsuya Takiguchi,et al. Emotional voice conversion using neural networks with arbitrary scales F0 based on wavelet transform , 2017, EURASIP Journal on Audio, Speech, and Music Processing.
[59] Tetsuya Takiguchi,et al. Neutral-to-emotional voice conversion with cross-wavelet transform F0 using generative adversarial networks , 2019, APSIPA Transactions on Signal and Information Processing.
[60] R. Pradeep Reddy,et al. Affective state recognition using audio cues , 2019, J. Intell. Fuzzy Syst..
[61] Luís C. Oliveira,et al. Emovoice: a system to generate emotions in speech , 2006, INTERSPEECH.
[62] Hongwu Yang,et al. A DNN-based emotional speech synthesis by speaker adaptation , 2018, 2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC).
[63] Guigang Zhang,et al. Deep Learning , 2016, Int. J. Semantic Comput..
[64] Steve J. Young,et al. A system for transforming the emotion in speech: combining data-driven conversion techniques for prosody and voice quality , 2007, INTERSPEECH.
[65] HIDEKI KAWAHARA,et al. Technical foundations of TANDEM-STRAIGHT, a speech analysis, modification and synthesis framework , 2011 .
[66] D. Govind,et al. Significance of Epoch Identification Accuracy in Prosody Modification for Effective Emotion Conversion , 2018, Communications in Computer and Information Science.
[67] Axel Röbel,et al. Sequence-to-sequence Modelling of F0 for Speech Emotion Conversion , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[68] S. R. Mahadeva Prasanna,et al. Dynamic prosody modification using zero frequency filtered signal , 2013, Int. J. Speech Technol..
[69] Haizhou Li,et al. Fundamental frequency modeling using wavelets for emotional voice conversion , 2015, 2015 International Conference on Affective Computing and Intelligent Interaction (ACII).
[70] Tomoki Toda,et al. GMM-based voice conversion applied to emotional speech synthesis , 2003, INTERSPEECH.
[71] Haizhou Li,et al. Conditional restricted Boltzmann machine for voice conversion , 2013, 2013 IEEE China Summit and International Conference on Signal and Information Processing.
[72] Tetsuya Takiguchi,et al. Emotional voice conversion using deep neural networks with MCC and F0 features , 2016, 2016 IEEE/ACIS 15th International Conference on Computer and Information Science (ICIS).
[73] Yonghong Yan,et al. High Quality Voice Conversion through Phoneme-Based Linear Mapping Functions with STRAIGHT for Mandarin , 2007, Fourth International Conference on Fuzzy Systems and Knowledge Discovery (FSKD 2007).
[74] Shikha Tripathi,et al. Emotion detection using perceptual based speech features , 2016, 2016 IEEE Annual India Conference (INDICON).
[75] Deepa Gupta,et al. Hybrid Framework for Speaker-Independent Emotion Conversion Using i-Vector PLDA and Neural Network , 2019, IEEE Access.
[76] Haizhou Li,et al. Exemplar-based sparse representation of timbre and prosody for voice conversion , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[77] Andries P. Hekstra,et al. Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).