Statistical parametric speech synthesis: from HMM to LSTM-RNN
暂无分享,去创建一个
[1] S. King,et al. Combining a vector space representation of linguistic context with a deep neural network for text-to-speech synthesis , 2013, SSW.
[2] Zhizheng Wu,et al. Sentence-level control vectors for deep neural network speech synthesis , 2015, INTERSPEECH.
[3] Heiga Zen,et al. Statistical parametric speech synthesis using deep neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[4] Heiga Zen,et al. Reformulating the HMM as a trajectory model by imposing explicit relationships between static and dynamic feature vector sequences , 2007, Comput. Speech Lang..
[5] Kuldip K. Paliwal,et al. Bidirectional recurrent neural networks , 1997, IEEE Trans. Signal Process..
[6] Heiga Zen,et al. A Viterbi algorithm for a trajectory model derived from HMM with explicit relationship between static and dynamic features , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.
[7] Heiga Zen,et al. Decision tree-based context clustering based on cross validation and hierarchical priors , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[8] H. Zen,et al. An HMM-based speech synthesis system applied to English , 2002, Proceedings of 2002 IEEE Workshop on Speech Synthesis, 2002..
[9] Bhuvana Ramabhadran,et al. F0 contour prediction with a deep belief network-Gaussian process hybrid model , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[10] Orhan Karaali,et al. Speech Synthesis with Neural Networks , 1998, ArXiv.
[11] Heiga Zen,et al. Deep learning in speech synthesis , 2013, SSW.
[12] Noel Massey,et al. Text-to-speech conversion with neural networks: a recurrent TDNN approach , 1998, EUROSPEECH.
[13] Anthony J. Robinson,et al. Static and Dynamic Error Propagation Networks with Application to Speech Coding , 1987, NIPS.
[14] Zhizheng Wu,et al. Minimum trajectory error training for deep neural networks, combined with stacked bottleneck features , 2015, INTERSPEECH.
[15] Mike Schuster,et al. On supervised learning from sequential data with applications for speech regognition , 1999 .
[16] Heiga Zen,et al. Product of Experts for Statistical Parametric Speech Synthesis , 2012, IEEE Transactions on Audio, Speech, and Language Processing.
[17] Mark J. F. Gales. Cluster adaptive training of hidden Markov models , 2000, IEEE Trans. Speech Audio Process..
[18] Ranniery Maia,et al. Towards a linear dynamical model based speech synthesizer , 2015, INTERSPEECH.
[19] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[20] P J Webros. BACKPROPAGATION THROUGH TIME: WHAT IT DOES AND HOW TO DO IT , 1990 .
[21] Frank K. Soong,et al. On the training aspects of Deep Neural Network (DNN) for parametric TTS synthesis , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[22] Heiga Zen,et al. Estimating Trajectory Hmm Parameters Using Monte Carlo Em With Gibbs Sampler , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.
[23] Junichi Yamagishi,et al. Average-Voice-Based Speech Synthesis , 2006 .
[24] Heiga Zen,et al. A Hidden Semi-Markov Model-Based Speech Synthesis System , 2007, IEICE Trans. Inf. Syst..
[25] Marcus Liwicki,et al. A novel approach to on-line handwriting recognition based on bidirectional long short-term memory networks , 2007 .
[26] Yoshua Bengio,et al. Gradient Flow in Recurrent Nets: the Difficulty of Learning Long-Term Dependencies , 2001 .
[27] Hideki Kawahara,et al. Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds , 1999, Speech Commun..
[28] S. Srihari. Mixture Density Networks , 1994 .
[29] Keiichi Tokuda,et al. An algorithm for speech parameter generation from continuous mixture HMMs with dynamic features , 1995, EUROSPEECH.
[30] Takashi Nose,et al. Statistical Parametric Speech Synthesis Based on Gaussian Process Regression , 2014, IEEE Journal of Selected Topics in Signal Processing.
[31] K. Koishida,et al. Vector quantization of speech spectral parameters using statistics of dynamic features , 1997 .
[32] Bhuvana Ramabhadran,et al. Prosody contour prediction with long short-term memory, bi-directional, deep recurrent neural networks , 2014, INTERSPEECH.
[33] Keiichi Tokuda,et al. Duration modeling for HMM-based speech synthesis , 1998, ICSLP.
[34] Keiichi Tokuda,et al. A Speech Parameter Generation Algorithm Considering Global Variance for HMM-Based Speech Synthesis , 2007, IEICE Trans. Inf. Syst..
[35] Keiichi Tokuda,et al. Speaker interpolation in HMM-based speech synthesis system , 1997, EUROSPEECH.
[36] Keiichi Tokuda,et al. Multi-Space Probability Distribution HMM , 2002 .
[37] Ranniery Maia,et al. Linear dynamical models in speech synthesis , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[38] Tomoki Toda,et al. Voice Conversion Based on Maximum-Likelihood Estimation of Spectral Parameter Trajectory , 2007, IEEE Transactions on Audio, Speech, and Language Processing.
[39] Heiga Zen,et al. Deep mixture density networks for acoustic modeling in statistical parametric speech synthesis , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[40] Yannis Agiomyrgiannakis,et al. Vocaine the vocoder and applications in speech synthesis , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[41] Heiga Zen,et al. The Effect of Using Normalized Models in Statistical Speech Synthesis , 2011, INTERSPEECH.
[42] 全 炳河,et al. Reformulating HMM as a trajectory model by imposing explicit relationships between static and dynamic features , 2006 .
[43] Simon King. A reading list of recent advances in speech synthesis , 2015 .
[44] Cassia Valentini-Botinhao,et al. Modelling acoustic feature dependencies with artificial neural networks: Trajectory-RNADE , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[45] F. Itakura,et al. A statistical method for estimation of speech spectral density and formant frequencies , 1970 .
[46] Keiichi Tokuda,et al. Simultaneous modeling of spectrum, pitch and duration in HMM-based speech synthesis , 1999, EUROSPEECH.
[47] Nam Soo Kim,et al. Decision Tree-Based Clustering with Outlier Detection for HMM-Based Speech Synthesis , 2011, INTERSPEECH.
[48] Yoshihiko Nankaku,et al. The effect of neural networks in statistical parametric speech synthesis , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[49] Li-Rong Dai,et al. Statistical parametric speech synthesis using a hidden trajectory model , 2015, Speech Commun..
[50] Paavo Alku,et al. Voice source modelling using deep neural networks for statistical parametric speech synthesis , 2014, 2014 22nd European Signal Processing Conference (EUSIPCO).
[51] Yoram Singer,et al. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..
[52] Heiga Zen,et al. Context adaptive training with factorized decision trees for HMM-based statistical parametric speech synthesis , 2011, Speech Commun..
[53] Zhi-Jie Yan,et al. Cross-validation based decision tree clustering for HMM-based TTS , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.
[54] Jeff A. Bilmes,et al. Robust splicing costs and efficient search with BMM Models for concatenative speech synthesis , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.
[55] Sridha Sridharan,et al. Trainable speech synthesis with trended hidden Markov models , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).
[56] Dong Yu,et al. Modeling Spectral Envelopes Using Restricted Boltzmann Machines and Deep Belief Networks for Statistical Parametric Speech Synthesis , 2013, IEEE Transactions on Audio, Speech, and Language Processing.
[57] Takashi Nose,et al. A Style Control Technique for HMM-Based Expressive Speech Synthesis , 2007, IEICE Trans. Inf. Syst..
[58] Alan W. Black,et al. Unit selection in a concatenative speech synthesis system using a large speech database , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.
[59] Frank K. Soong,et al. Generating natural F0 trajectory with additive trees , 2008, INTERSPEECH.
[60] Simon King,et al. Deep neural networks employing Multi-Task Learning and stacked bottleneck features for speech synthesis , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[61] Helen M. Meng,et al. Multi-distribution deep belief network for speech synthesis , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[62] Lawrence R. Rabiner,et al. A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.
[63] Ren-Hua Wang,et al. Minimum Generation Error Training for HMM-Based Speech Synthesis , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.
[64] Jong-Jin Kim,et al. HMM-based Korean speech synthesis system for hand-held devices , 2006, IEEE Transactions on Consumer Electronics.
[65] Keiichi Tokuda,et al. Eigenvoices for HMM-based speech synthesis , 2002, INTERSPEECH.
[66] Heiga Zen,et al. An excitation model for HMM-based speech synthesis based on residual modeling , 2007, SSW.
[67] Geoffrey E. Hinton,et al. Distributed Representations , 1986, The Philosophy of Artificial Intelligence.
[68] Heiga Zen,et al. Autoregressive Models for Statistical Parametric Speech Synthesis , 2013, IEEE Transactions on Audio, Speech, and Language Processing.
[69] Harri Valpola,et al. Bayesian Ensemble Learning for Nonlinear Factor Analysis , 2000 .
[70] Zhizheng Wu,et al. Fusion of multiple parameterisations for DNN-based sinusoidal speech synthesis with multi-task learning , 2015, INTERSPEECH.
[71] Geoffrey E. Hinton,et al. On rectified linear units for speech processing , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[72] Nir Friedman,et al. Probabilistic Graphical Models , 2009, Data-Driven Computational Neuroscience.
[73] Yoshihiko Nankaku,et al. Contextual Additive Structure for HMM-Based Speech Synthesis , 2014, IEEE Journal of Selected Topics in Signal Processing.
[74] Frank K. Soong,et al. Multi-speaker modeling and speaker adaptation for DNN-based TTS synthesis , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[75] Heiga Zen,et al. Unidirectional long short-term memory recurrent neural network with recurrent output layer for low-latency speech synthesis , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[76] Heiga Zen,et al. Context-dependent additive log f_0 model for HMM-based speech synthesis , 2009, INTERSPEECH.
[77] Dong Yu,et al. Deep Learning and Its Applications to Signal and Information Processing , 2011 .
[78] Frank K. Soong,et al. TTS synthesis with bidirectional LSTM based recurrent neural networks , 2014, INTERSPEECH.
[79] Sadaoki Furui,et al. Speaker-independent isolated word recognition using dynamic features of speech spectrum , 1986, IEEE Trans. Acoust. Speech Signal Process..
[80] Georg Heigold,et al. An empirical study of learning rates in deep neural networks for speech recognition , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[81] Keiichi Tokuda,et al. Speech parameter generation algorithms for HMM-based speech synthesis , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).
[82] Haifeng Li,et al. Sequence error (SE) minimization training of neural network for voice conversion , 2014, INTERSPEECH.
[83] Jj Odell,et al. The Use of Context in Large Vocabulary Speech Recognition , 1995 .
[84] Alex Graves,et al. Generating Sequences With Recurrent Neural Networks , 2013, ArXiv.
[85] S. Roweis,et al. Learning Nonlinear Dynamical Systems Using the Expectation–Maximization Algorithm , 2001 .
[86] William J. Byrne,et al. Autoregressive clustering for HMM speech synthesis , 2010, INTERSPEECH.
[87] Satoshi Imai,et al. Cepstral analysis synthesis on the mel frequency scale , 1983, ICASSP.
[88] Eric Moulines,et al. Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones , 1989, Speech Commun..
[89] Yoshua. Bengio,et al. Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..
[90] Keiichi Tokuda,et al. Adaptation of pitch and spectrum for HMM-based speech synthesis using MLLR , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).
[91] Feng Ding,et al. A polynomial segment model based statistical parametric speech synthesis sytem , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.
[92] Mark J. F. Gales,et al. Switching linear dynamical systems for speech recognition , 2003 .
[93] Kai Yu,et al. An investigation of implementation and performance analysis of DNN based speech synthesis system , 2014, 2014 12th International Conference on Signal Processing (ICSP).
[94] Frank K. Soong,et al. Modeling DCT parameterized F0 trajectory at intonation phrase level with DNN or decision tree , 2014, INTERSPEECH.
[95] Heiga Zen,et al. Statistical Parametric Speech Synthesis Based on Speaker and Language Factorization , 2012, IEEE Transactions on Audio, Speech, and Language Processing.
[96] Tomoki Toda,et al. A postfilter to modify the modulation spectrum in HMM-based speech synthesis , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[97] Shaul Markovitch,et al. Anytime Learning of Decision Trees , 2007, J. Mach. Learn. Res..
[98] Tony Robinson,et al. Speech synthesis using artificial neural networks trained on cepstral coefficients , 1993, EUROSPEECH.
[99] Carl Quillen. Kalman filter based speech synthesis , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.
[100] Heiga Zen,et al. Statistical Parametric Speech Synthesis , 2007, IEEE International Conference on Acoustics, Speech, and Signal Processing.