Polyglot Speech Synthesis Based on Cross-Lingual Frame Selection Using Auditory and Articulatory Features
暂无分享,去创建一个
Chung-Hsien Wu | Yi-Chin Huang | Chia-Ping Chen | Kuan-De Lee | Chung-Hsien Wu | Yi-Chin Huang | Chia-Ping Chen | Kuan-De Lee
[1] Alexander Kain,et al. Spectral voice conversion for text-to-speech synthesis , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).
[2] Mark J. F. Gales. Cluster adaptive training of hidden Markov models , 2000, IEEE Trans. Speech Audio Process..
[3] Qi Li,et al. An Auditory-Based Feature Extraction Algorithm for Robust Speaker Identification Under Mismatched Conditions , 2011, IEEE Transactions on Audio, Speech, and Language Processing.
[4] Frank K. Soong,et al. A Cross-Language State Sharing and Mapping Approach to Bilingual (Mandarin–English) TTS , 2009, IEEE Transactions on Audio, Speech, and Language Processing.
[5] Hui Liang,et al. VTLN adaptation for statistical speech synthesis , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.
[6] Daniel Erro,et al. Voice Conversion of Non-aligned Data using Unit Selection , 2006 .
[7] Hermann Ney,et al. Vocal tract normalization equals linear transformation in cepstral space , 2001, IEEE Transactions on Speech and Audio Processing.
[8] R. Lyon. Auditory Effects for ASR , 1996 .
[9] Tanja Schultz,et al. Multilingual articulatory features , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..
[10] Chin-Hui Lee,et al. A penalized logistic regression approach to detection based phone classification , 2008, INTERSPEECH.
[11] Chung-Hsien Wu,et al. Exploiting Prosody Hierarchy and Dynamic Features for Pitch Modeling and Generation in HMM-Based Speech Synthesis , 2010, IEEE Transactions on Audio, Speech, and Language Processing.
[12] Chung-Hsien Wu,et al. Voice conversion using duration-embedded bi-HMMs for expressive speech synthesis , 2006, IEEE Transactions on Audio, Speech, and Language Processing.
[13] Heiga Zen,et al. The HMM-based speech synthesis system (HTS) version 2.0 , 2007, SSW.
[14] Yu Shi,et al. Segmental tonal modeling for phone set design in Mandarin LVCSR , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.
[15] Zhi-Jie Yan,et al. A Unified Trajectory Tiling Approach to High Quality Speech Rendering , 2013, IEEE Transactions on Audio, Speech, and Language Processing.
[16] Hui Liang,et al. Cross-Lingual Speaker Discrimination Using Natural and Synthetic Speech , 2011, INTERSPEECH.
[17] Richard F. Lyon,et al. A computational model of filtering, detection, and compression in the cochlea , 1982, ICASSP.
[18] Roberto Togneri,et al. An Auditory Motivated Asymmetric Compression Technique for Speech Recognition , 2011, IEEE Transactions on Audio, Speech, and Language Processing.
[19] Heiga Zen,et al. Statistical Parametric Speech Synthesis , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.
[20] Cai Rui. TH-CoSS,a Mandarin Speech Corpus for TTS , 2007 .
[21] Lawrence R. Rabiner,et al. A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.
[22] Athanasios Mouchtaris,et al. Nonparallel training for voice conversion based on a parameter adaptation approach , 2006, IEEE Transactions on Audio, Speech, and Language Processing.
[23] D. Rubin,et al. Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .
[24] Yu Tsao,et al. A study on detection based automatic speech recognition , 2006, INTERSPEECH.
[25] Keiichi Tokuda,et al. Speaker adaptation and the evaluation of speaker similarity in the EMIME speech-to-speech translation project , 2010, SSW.
[26] Gernot A. Fink,et al. Combining acoustic and articulatory feature information for robust speech recognition , 2002, Speech Commun..
[27] Eric Moulines,et al. Continuous probabilistic transform for voice conversion , 1998, IEEE Trans. Speech Audio Process..
[28] Yoshihiko Nankaku,et al. State mapping based method for cross-lingual speaker adaptation in HMM-based speech synthesis , 2009, INTERSPEECH.
[29] Hui Liang,et al. Phonological Knowledge Guided HMM State Mapping for Cross-Lingual Speaker Adaptation , 2011, INTERSPEECH.
[30] 吉村 貴克,et al. Simultaneous modeling of phonetic and prosodic parameters,and characteristic conversion for HMM-based text-to-speech systems , 2002 .
[31] Hermann Ney,et al. Text-independent cross-language voice conversion , 2006, INTERSPEECH.
[32] Daniel Erro,et al. INCA Algorithm for Training Voice Conversion Systems From Nonparallel Corpora , 2010, IEEE Transactions on Audio, Speech, and Language Processing.
[33] Oliver Watts,et al. Synthesis of Child Speech With HMM Adaptation and Voice Conversion , 2010, IEEE Transactions on Audio, Speech, and Language Processing.
[34] Keiichi Tokuda,et al. Simultaneous modeling of spectrum, pitch and duration in HMM-based speech synthesis , 1999, EUROSPEECH.
[35] Chung-Hsien Wu,et al. Hierarchical Prosody Conversion Using Regression-Based Clustering for Emotional Speech Synthesis , 2010, IEEE Transactions on Audio, Speech, and Language Processing.
[36] Sadaoki Furui,et al. New approach to the polyglot speech generation by means of an HMM-based speaker adaptable synthesizer , 2006, Speech Commun..
[37] Hui Ye,et al. Voice conversion for unknown speakers , 2004, INTERSPEECH.
[38] S. Seneff. A joint synchrony/mean-rate model of auditory speech processing , 1990 .
[39] Mari Ostendorf,et al. Moving beyond the 'beads-on-a-string' model of speech , 1999 .
[40] R.A. Cole,et al. Speaker-independent vowel recognition: spectrograms versus cochleagrams , 1990, International Conference on Acoustics, Speech, and Signal Processing.
[41] Hideki Kawahara,et al. Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds , 1999, Speech Commun..
[42] Mark J. F. Gales,et al. Maximum likelihood linear transformations for HMM-based speech recognition , 1998, Comput. Speech Lang..
[43] Heiga Zen,et al. Statistical Parametric Speech Synthesis Based on Speaker and Language Factorization , 2012, IEEE Transactions on Audio, Speech, and Language Processing.
[44] Beat Pfister,et al. From multilingual to polyglot speech synthesis , 1999, EUROSPEECH.
[45] Keiichi Tokuda,et al. Multi-Space Probability Distribution HMM , 2002 .
[46] Tomoki Toda,et al. Voice Conversion Based on Maximum-Likelihood Estimation of Spectral Parameter Trajectory , 2007, IEEE Transactions on Audio, Speech, and Language Processing.