Overcoming the limitations of statistical parametric speech synthesis
暂无分享,去创建一个
[1] Heiga Zen,et al. Deep Learning for Acoustic Modeling in Parametric Speech Generation: A systematic review of existing techniques and future trends , 2015, IEEE Signal Processing Magazine.
[2] Heiga Zen,et al. The HMM-based speech synthesis system (HTS) version 2.0 , 2007, SSW.
[3] Keiichi Tokuda,et al. Introduction to the Issue on Statistical Parametric Speech Synthesis , 2014, IEEE J. Sel. Top. Signal Process..
[4] Paavo Alku,et al. The GlottHMM Entry for Blizzard Challenge 2012: Hybrid Approach , 2012 .
[5] Ren-Hua Wang,et al. The USTC System for Blizzard Challenge 2010 , 2008 .
[6] Paavo Alku,et al. Wavelets for intonation modeling in HMM speech synthesis , 2013, SSW.
[7] Junichi Yamagishi,et al. An investigation of the application of dynamic sinusoidal models to statistical parametric speech synthesis , 2014, INTERSPEECH.
[8] Simon King,et al. Smooth talking: Articulatory join costs for unit selection , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[9] Cassia Valentini-Botinhao,et al. Hurricane natural speech corpus , 2013 .
[10] Tomoki Toda,et al. Parameter Generation Methods With Rich Context Models for High-Quality and Flexible Text-To-Speech Synthesis , 2014, IEEE Journal of Selected Topics in Signal Processing.
[11] Vincent Pollet,et al. Refined inter-segment joining in multi-form speech synthesis , 2014, INTERSPEECH.
[12] Sin-Horng Chen,et al. An RNN-based prosodic information synthesizer for Mandarin text-to-speech , 1998, IEEE Trans. Speech Audio Process..
[13] Paavo Alku,et al. HMM-Based Speech Synthesis Utilizing Glottal Inverse Filtering , 2011, IEEE Transactions on Audio, Speech, and Language Processing.
[14] Heiga Zen,et al. Statistical parametric speech synthesis using deep neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[15] Philip J. B. Jackson,et al. Pitch-scaled estimation of simultaneous voiced and turbulence-noise components in speech , 2001, IEEE Trans. Speech Audio Process..
[16] Simon King,et al. Measuring the perceptual effects of modelling assumptions in speech synthesis using stimuli constructed from repeated natural speech , 2014, INTERSPEECH.
[17] Tomoki Toda,et al. Parameter generation algorithm considering Modulation Spectrum for HMM-based speech synthesis , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[18] Thierry Dutoit,et al. The Deterministic Plus Stochastic Model of the Residual Signal and Its Applications , 2012, IEEE Transactions on Audio, Speech, and Language Processing.
[19] 小石田和人. Low Bit Rate Speech Coding Based on Mel-Generalized Cepstral Analysis(メル一般化ケプストラム分析に基づく低ビットレート音声符号化) , 1998 .
[20] Tomoki Toda,et al. Improvements to HMM-based speech synthesis based on parameter generation with rich context models , 2013, INTERSPEECH.
[21] Heiga Zen,et al. An overview of nitech HMM-based speech synthesis system for blizzard challenge 2005 , 2005, INTERSPEECH.
[22] Tomoki Toda,et al. A postfilter to modify the modulation spectrum in HMM-based speech synthesis , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[23] Tuomo Raitio,et al. DNN-based stochastic postfilter for HMM-based speech synthesis , 2014, INTERSPEECH.
[24] Heiga Zen,et al. Deep mixture density networks for acoustic modeling in statistical parametric speech synthesis , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[25] Junichi Yamagishi,et al. A fixed dimension and perceptually based dynamic sinusoidal model of speech , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[26] Cassia Valentini-Botinhao,et al. Intelligibility enhancement of synthetic speech in noise , 2013 .
[27] Keiichi Tokuda,et al. Mel-generalized cepstral analysis - a unified approach to speech spectral estimation , 1994, ICSLP.
[28] Paavo Alku,et al. Comparing glottal-flow-excited statistical parametric speech synthesis methods , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[29] Simon King,et al. Listeners' weighting of acoustic cues to synthetic speech naturalness: A multidimensional scaling analysis , 2011, Speech Commun..
[30] John R. Hershey,et al. Approximating the Kullback Leibler Divergence Between Gaussian Mixture Models , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.
[31] Vincent Pollet,et al. Uniform Speech Parameterization for Multi-Form Segment Synthesis , 2011, INTERSPEECH.
[32] Yannis Stylianou,et al. Applying the harmonic plus noise model in concatenative speech synthesis , 2001, IEEE Trans. Speech Audio Process..
[33] Tomoki Toda,et al. An Evaluation of Parameter Generation Methods with Rich Context Models in HMM-Based Speech Synthesis , 2012, INTERSPEECH.
[34] Yamato Ohtani,et al. Continuous F0 in the source-excitation generation for HMM-based TTS: Do we need voiced/unvoiced classification? , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[35] Qian Yao. A UNIFIED TRAJECTORY TILING APPROACH TO HIGH QUALITY SPEECH RENDERING , 2013 .
[36] Georg Heigold,et al. Word embeddings for speech recognition , 2014, INTERSPEECH.
[37] Heiga Zen,et al. Directly modeling speech waveforms by neural networks for statistical parametric speech synthesis , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[38] Keiichi Tokuda,et al. An adaptive algorithm for mel-cepstral analysis of speech , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.
[39] Moncef Gabbouj,et al. Ways to Implement Global Variance in Statistical Speech Synthesis , 2012, INTERSPEECH.
[40] Simon King,et al. Investigating source and filter contributions, and their interaction, to statistical parametric speech synthesis , 2014, INTERSPEECH.
[41] Simon King,et al. Attributing modelling errors in HMM synthesis by stepping gradually from natural to modelled speech , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[42] Shigeo Abe DrEng. Pattern Classification , 2001, Springer London.
[43] Eric Moulines,et al. Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones , 1989, Speech Commun..
[44] Heiga Zen,et al. The Effect of Using Normalized Models in Statistical Speech Synthesis , 2011, INTERSPEECH.
[45] Tomoki Toda,et al. Modified post-filter to recover modulation spectrum for HMM-based speech synthesis , 2014, 2014 IEEE Global Conference on Signal and Information Processing (GlobalSIP).
[46] Heiga Zen,et al. A Hidden Semi-Markov Model-Based Speech Synthesis System , 2007, IEICE Trans. Inf. Syst..
[47] Daniel Erro,et al. Flexible harmonic/stochastic speech synthesis , 2007, SSW.
[48] Alan W. Black,et al. CLUSTERGEN: a statistical parametric synthesizer using trajectory modeling , 2006, INTERSPEECH.
[49] Simon King,et al. Multidimensional scaling of listener responses to synthetic speech , 2005, INTERSPEECH.
[50] Junichi Yamagishi,et al. Multiple feed-forward deep neural networks for statistical parametric speech synthesis , 2015, INTERSPEECH.
[51] Cassia Valentini-Botinhao,et al. Intelligibility-enhancing speech modifications: the hurricane challenge , 2020, INTERSPEECH.
[52] Alan W. Black,et al. Unit selection in a concatenative speech synthesis system using a large speech database , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.
[53] Keiichi Tokuda,et al. Simultaneous modeling of spectrum, pitch and duration in HMM-based speech synthesis , 1999, EUROSPEECH.
[54] Heiga Zen,et al. Unidirectional long short-term memory recurrent neural network with recurrent output layer for low-latency speech synthesis , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[55] Ren-Hua Wang,et al. Minimum unit selection error training for HMM-based unit selection speech synthesis system , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.
[56] Keiichi Tokuda,et al. Duration modeling for HMM-based speech synthesis , 1998, ICSLP.
[57] Hermann Ney,et al. Evaluation of VTLN-based voice conversion for embedded speech synthesis , 2005, INTERSPEECH.
[58] I. Titze. Nonlinear source-filter coupling in phonation: theory. , 2008, The Journal of the Acoustical Society of America.
[59] Yoshihiko Nankaku,et al. The effect of neural networks in statistical parametric speech synthesis , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[60] Simon King,et al. Robustness of HMM-based speech synthesis , 2008, INTERSPEECH.
[61] Paul Taylor,et al. The target cost formulation in unit selection speech synthesis , 2006, INTERSPEECH.
[62] Alistair Conkie. A robust unit selection system for speech synthesis , 1999 .
[63] Hideki Kawahara,et al. STRAIGHT, exploitation of the other aspect of VOCODER: Perceptually isomorphic decomposition of speech sounds , 2006 .
[64] Simon King,et al. Measuring a decade of progress in Text-to-Speech , 2014 .
[65] Final Report : OUCH Project ( Outing Unfortunate Characteristics of HMMs ) , 2013 .
[66] Heiga Zen,et al. Hidden semi-Markov model based speech synthesis , 2004, INTERSPEECH.
[67] Yannis Stylianou,et al. Harmonic plus noise models for speech, combined with statistical methods, for speech and speaker modification , 1996 .
[68] A. Bonafonte,et al. FLEXIBLE HARMONIC / STOCHASTIC MODELING FOR HMM-BASED SPEECH SYNTHESIS , 2008 .
[69] Keiichi Tokuda,et al. A Speech Parameter Generation Algorithm Considering Global Variance for HMM-Based Speech Synthesis , 2007, IEICE Trans. Inf. Syst..
[70] Paavo Alku,et al. Glottal wave analysis with Pitch Synchronous Iterative Adaptive Inverse Filtering , 1991, Speech Commun..
[71] Keiichi Tokuda,et al. Multi-Space Probability Distribution HMM , 2002 .
[72] Zhi-Jie Yan,et al. RIch-context Unit Selection (RUS) approach to high quality TTS , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.
[73] Koichi Shinoda,et al. MDL-based context-dependent subword modeling for speech recognition , 2000 .
[74] Zhenhua Ling. HMM-based Unit Selection Using F , 2006 .
[75] Ren-Hua Wang,et al. HMM-Based Hierarchical Unit Selection Combining Kullback-Leibler Divergence with Likelihood Criterion , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.
[76] Zhizheng Wu,et al. Minimum trajectory error training for deep neural networks, combined with stacked bottleneck features , 2015, INTERSPEECH.
[77] Mark J. F. Gales,et al. The Application of Hidden Markov Models in Speech Recognition , 2007, Found. Trends Signal Process..
[78] Zhizheng Wu,et al. Deep neural network context embeddings for model selection in rich-context HMM synthesis , 2015, INTERSPEECH.
[79] Mark J. F. Gales,et al. Semi-tied covariance matrices for hidden Markov models , 1999, IEEE Trans. Speech Audio Process..
[80] Srikanth Ronanki,et al. The CSTR entry to the Blizzard Challenge 2016 , 2016 .
[81] Frank K. Soong,et al. A cross-language state mapping approach to bilingual (Mandarin-English) TTS , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.
[82] Michal Tadeusz Kaszczuk,et al. The IVO Software Blizzard Challenge 2009 Entry: Improving IVONA Text-To-Speech , 2009 .
[83] Simon King,et al. Subjective evaluation of join cost and smoothing methods for unit selection speech synthesis , 2004, IEEE Transactions on Audio, Speech, and Language Processing.
[84] Paavo Alku,et al. The GlottHMM Speech Synthesis Entry for Blizzard Challenge 2010 , 2010 .
[85] Frank K. Soong,et al. TTS synthesis with bidirectional LSTM based recurrent neural networks , 2014, INTERSPEECH.
[86] Zhi-Jie Yan,et al. Rich context modeling for high quality HMM-based TTS , 2009, INTERSPEECH.
[87] Antonio Bonafonte,et al. A Bilingual Spanish-Catalan Database of Units for Concatenative Synthesis , 1997 .
[88] Junichi Yamagishi,et al. A perceptual investigation of wavelet-based decomposition of f0 for text-to-speech synthesis , 2015, INTERSPEECH.
[89] Junichi Yamagishi,et al. Utilization of an HMM-based feature generation module in 5 ms segment concatenative speech synthesis , 2007, SSW.
[90] Gunnar Fant,et al. Acoustic Theory Of Speech Production , 1960 .
[91] Paul Taylor,et al. The architecture of the Festival speech synthesis system , 1998, SSW.
[92] Paul Taylor. Unifying unit selection and hidden Markov model speech synthesis , 2006, INTERSPEECH.
[93] Oliver Watts,et al. Knowledge versus data in TTS: evaluation of a continuum of synthesis systems , 2015, INTERSPEECH.
[94] Philip C. Woodland,et al. Automatic speech synthesiser parameter estimation using HMMs , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.
[95] Keiichi Tokuda,et al. Speech parameter generation algorithms for HMM-based speech synthesis , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).
[96] Hirokazu Kameoka,et al. Text-to-speech synthesizer based on combination of composite wavelet and hidden Markov models , 2013, SSW.
[97] Simon King,et al. Multisyn: Open-domain unit selection for the Festival speech synthesis system , 2007, Speech Commun..
[98] Michal Tadeusz Kaszczuk,et al. The IVO Software Blizzard 2007 Entry: Improving Ivona Speech Synthesis System , 2007 .
[99] Simon King,et al. An introduction to statistical parametric speech synthesis , 2011 .
[100] Aimilios Chalamandaris,et al. The ILSP / INNOETICS Text-to-Speech System for the Blizzard Challenge 2014 , 2013 .
[101] Jj Odell,et al. The Use of Context in Large Vocabulary Speech Recognition , 1995 .
[102] Alan W. Black,et al. Random forests for statistical speech synthesis , 2015, INTERSPEECH.
[103] E. Paulus,et al. Speech Signal Processing , 1997, The Electrical Engineering Handbook - Six Volume Set.
[104] Heiga Zen,et al. Directly modeling voiced and unvoiced components in speech waveforms by neural networks , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[105] Hideki Kawahara,et al. Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds , 1999, Speech Commun..
[106] Simon King,et al. Investigating the shortcomings of HMM synthesis , 2013, SSW.
[107] Vincent Pollet,et al. Psychoacoustic Segment Scoring for Multi-Form Speech Synthesis , 2012, INTERSPEECH.
[108] Paavo Alku,et al. Utilizing glottal source pulse library for generating improved excitation signal for HMM-based speech synthesis , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[109] Zhen-Hua Ling,et al. DBN-based Spectral Feature Representation for Statistical Parametric Speech Synthesis , 2016, IEEE Signal Processing Letters.
[110] Robert A. J. Clark,et al. A multi-level representation of f0 using the continuous wavelet transform and the Discrete Cosine Transform , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[111] Zhizheng Wu,et al. Deep neural network-guided unit selection synthesis , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[112] Bhuvana Ramabhadran,et al. Using deep bidirectional recurrent neural networks for prosodic-target prediction in a unit-selection text-to-speech system , 2015, INTERSPEECH.
[113] Heiga Zen,et al. Decision tree-based context clustering based on cross validation and hierarchical priors , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[114] Michal Kaszczuk. Evaluating Ivona Speech Synthesis System for Blizzard Challenge 2006 , 2006 .
[115] Zhizheng Wu,et al. From HMMS to DNNS: Where do the improvements come from? , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[116] Vincent Pollet,et al. Synthesis by generation and concatenation of multiform segments , 2008, INTERSPEECH.
[117] David Suendermann,et al. Challenges in Speech Synthesis , 2010 .
[118] Paavo Alku,et al. The GlottHMM Entry for Blizzard Challenge 2011: Utilizing Source Unit Selection in HMM-Based Speech Synthesis for Improved Excitation Generation , 2011 .
[119] Paul Taylor,et al. Text-to-Speech Synthesis , 2009 .
[120] Stephen Isard,et al. Optimal coupling of diphones , 1994, SSW.
[121] Simon King,et al. Using HMM-based Speech Synthesis to Reconstruct the Voice of Individuals with Degenerative Speech Disorders , 2012, INTERSPEECH.
[122] Zhi-Jie Yan,et al. An HMM trajectory tiling (HTT) approach to high quality TTS , 2010, INTERSPEECH.
[123] 吉村 貴克,et al. Simultaneous modeling of phonetic and prosodic parameters,and characteristic conversion for HMM-based text-to-speech systems , 2002 .
[124] Paul Taylor,et al. Automatically clustering similar units for unit selection in speech synthesis , 1997, EUROSPEECH.
[125] Heiga Zen,et al. Statistical parametric speech synthesis: from HMM to LSTM-RNN , 2015 .
[126] Hideki Kawahara,et al. Aperiodicity extraction and control using mixed mode excitation and group delay manipulation for a high quality speech analysis, modification and synthesis system STRAIGHT , 2001, MAVEBA.
[127] Keiichi Tokuda,et al. CELP coding based on mel-cepstral analysis , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.
[128] Paavo Alku,et al. Comparison of formant enhancement methods for HMM-based speech synthesis , 2010, SSW.
[129] Heiga Zen,et al. Statistical Parametric Speech Synthesis , 2007, IEEE International Conference on Acoustics, Speech, and Signal Processing.
[130] Satoshi Imai,et al. Cepstral analysis synthesis on the mel frequency scale , 1983, ICASSP.
[131] Simon King,et al. Festival 2 - build your own general purpose unit selection speech synthesiser , 2004, SSW.
[132] Phil Hoole,et al. Announcing the Electromagnetic Articulography (Day 1) Subset of the mngu0 Articulatory Corpus , 2011, INTERSPEECH.
[133] João P. Cabral. HMM-based Speech Synthesis Using an Acoustic Glottal Source Model , 2011 .
[134] Paavo Alku,et al. Synthesis and perception of breathy, normal, and Lombard speech in the presence of noise , 2014, Comput. Speech Lang..