Corpus-based unit selection for natural-sounding speech synthesis
暂无分享,去创建一个
[1] Steven C. Lee. Probabilistic segmentation for segment-based speech recognition , 1998 .
[2] Timothy J. Hazen,et al. Pronunciation modeling using a finite-state transducer representation , 2005, Speech Commun..
[3] Peter Ladefoged,et al. The Revised International Phonetic Alphabet. , 1990 .
[4] Alan W. Black,et al. Perfect synthesis for all of the people all of the time , 2002, Proceedings of 2002 IEEE Workshop on Speech Synthesis, 2002..
[5] Victor Zue,et al. Properties of large lexicons: Implications for advanced isolated word recognition systems , 1982, ICASSP.
[6] Han Shu,et al. EM training of finite-state transducers and its application to pronunciation modeling , 2002, INTERSPEECH.
[7] Jan P. H. van Santen,et al. Combinatorial issues in text-to-speech synthesis , 1997, EUROSPEECH.
[8] David J. Goodman,et al. Personal Communications , 1994, Mobile Communications.
[9] P. Frasconi,et al. Representation of Finite State Automata in Recurrent Radial Basis Function Networks , 1996, Machine Learning.
[10] Mitchell P. Marcus,et al. Parsing a Natural Language Using Mutual Information Statistics , 1990, AAAI.
[11] Thierry Dutoit,et al. MBR-PSOLA: Text-To-Speech synthesis based on an MBE re-synthesis of the segments database , 1993, Speech Commun..
[12] C. Lee Giles,et al. Constructing deterministic finite-state automata in recurrent neural networks , 1996, JACM.
[13] Y. Sagisaka,et al. Speech synthesis by rule using an optimal selection of non-uniform synthesis units , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.
[14] Michael W. Macon,et al. Control of spectral dynamics in concatenative speech synthesis , 2001, IEEE Trans. Speech Audio Process..
[15] Gregory A. Sanders,et al. DARPA communicator dialog travel planning systems: the june 2000 data collection , 2001, INTERSPEECH.
[16] Alex Acero,et al. Whistler: a trainable text-to-speech system , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.
[17] Emmanuel Roche,et al. Finite-State Language Processing , 1997 .
[18] Robert E. Donovan,et al. A new distance measure for costing spectral discontinuities in concatenative speech synthesizers , 2001, SSW.
[19] S. Nakajima,et al. Automatic generation of synthesis units based on context oriented clustering , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.
[20] Mari Ostendorf,et al. TOBI: a standard for labeling English prosody , 1992, ICSLP.
[21] Mehryar Mohri,et al. Rapid unit selection from a large speech corpus for concatenative speech synthesis , 1999, EUROSPEECH.
[22] J. Olive,et al. Rule synthesis of speech from dyadic units , 1977 .
[23] Frédéric Bimbot,et al. Inference of variable-length linguistic and acoustic units by multigrams , 1997, Speech Commun..
[24] Yannis Stylianou,et al. Applying the harmonic plus noise model in concatenative speech synthesis , 2001, IEEE Trans. Speech Audio Process..
[25] Barbara Heuft,et al. Emotions in time domain synthesis , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.
[26] G. Schwarz. Estimating the Dimension of a Model , 1978 .
[27] Stan Davis,et al. Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Se , 1980 .
[28] T. Feustel,et al. Capacity Demands in Short-Term Memory for Synthetic and .Natural Speech , 1983, Human factors.
[29] D. Talkin. Speech formant trajectory estimation using dynamic programming with modulated transition costs , 1987 .
[30] Bernd Möbius. Corpus-based speech synthesis : Methods and challenges , 2000 .
[31] Doroteo Torre Toledano,et al. Trying to mimic human segmentation of speech using HMM and fuzzy logic post-correction rules , 1998, SSW.
[32] John Cocke,et al. A Statistical Approach to Machine Translation , 1990, CL.
[33] Shin'ya Nakajima. English speech synthesis based on multi-layered context oriented clustering; towards multi-lingual speech synthesis , 1993, EUROSPEECH.
[34] A. M. Turing,et al. Computing Machinery and Intelligence , 1950, The Philosophy of Artificial Intelligence.
[35] Grace Chung. Automatically incorporating unknown words in JUPITER , 2000, INTERSPEECH.
[36] Richard Sproat,et al. Multilingual Text-to-Speech Synthesis: The Bell Labs Approach , 1998, CL.
[37] Michael W. Macon,et al. A perceptual evaluation of distance measures for concatenative speech synthesis , 1998, ICSLP.
[38] Benjamin M. Serridge. Context-dependent modeling in a segment-based speech recognition system , 1997 .
[39] V.W. Zue,et al. The use of speech knowledge in automatic speech recognition , 1985, Proceedings of the IEEE.
[40] Marc C. Beutnagel,et al. The AT & T NEXT-GEN TTS system , 1999 .
[41] Chian Chuu. LIESHOU : A Mandarin Conversational Task Agent for the Galaxy-II Architecture , 2003 .
[42] Raymond N. J. Veldhuis,et al. On the reduction of concatenation artefacts in diphone synthesis , 1998, ICSLP.
[43] Iain R. Murray,et al. Toward the simulation of emotion in synthetic speech: a review of the literature on human vocal emotion. , 1993, The Journal of the Acoustical Society of America.
[44] Mari Ostendorf,et al. Joint prosody prediction and unit selection for concatenative speech synthesis , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).
[45] Fant Cg. Descriptive analysis of the acoustic aspects of speech. , 1962 .
[46] John Nicholas Holmes,et al. Speech synthesis , 1972 .
[47] P Taylor,et al. Analysis and synthesis of intonation using the Tilt model. , 2000, The Journal of the Acoustical Society of America.
[48] Stephanie Seneff,et al. Intelligent barge-in in conversational systems , 2000, INTERSPEECH.
[49] Yong Zhao,et al. Perpetually optimizing the cost function for unit selection in a TTS system with one single run of MOS evaluation , 2002, INTERSPEECH.
[50] A. Gray,et al. Distance measures for speech processing , 1976 .
[51] Raymond N. J. Veldhuis,et al. Reducing audible spectral discontinuities , 2001, IEEE Trans. Speech Audio Process..
[52] Robert I. Damper,et al. A multistrategy approach to improving pronunciation by analogy , 2000, CL.
[53] Yannis Stylianou,et al. Exploration of acoustic correlates in speaker selection for concatenative synthesis , 1998, ICSLP.
[54] James R. Glass,et al. Real-time telephone-based speech recognition in the Jupiter domain , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).
[55] Erhard Rank,et al. Generating emotional speech with a concatenative synthesizer , 1998, ICSLP.
[56] Paul Taylor,et al. A Phonetic Model of English Intonation , 1992 .
[57] Mari Ostendorf,et al. The impact of speech recognition on speech synthesis , 2002, Proceedings of 2002 IEEE Workshop on Speech Synthesis, 2002..
[58] Victor W. Zue,et al. Lexical stress and its application in large vocabulary speech recognition , 1984 .
[59] James R. Glass,et al. Natural-sounding speech synthesis using variable-length units , 1998, ICSLP.
[60] Hu Peng,et al. An objective measure for estimating MOS of synthesized speech , 2001, INTERSPEECH.
[61] David Talkin,et al. Voicing epoch determination with dynamic programming , 1989 .
[62] Stephanie Seneff,et al. The development of the MIT Lisp-machine based speech research workstation , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.
[63] James R. Glass,et al. A probabilistic framework for feature-based speech recognition , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.
[64] Yannis Stylianou. Removing linear phase mismatches in concatenative speech synthesis , 2001, IEEE Trans. Speech Audio Process..
[65] Victor Zue,et al. MUXING: a telephone-access Mandarin conversational system , 2000, INTERSPEECH.
[66] Andrej Ljolje,et al. Automatic segmentation of speech for TTS , 1993, EUROSPEECH.
[67] Thomas F. Quatieri,et al. Speech analysis/Synthesis based on a sinusoidal representation , 1986, IEEE Trans. Acoust. Speech Signal Process..
[68] Jae Lim,et al. Signal estimation from modified short-time Fourier transform , 1984 .
[69] D H Klatt,et al. Review of text-to-speech conversion for English. , 1987, The Journal of the Acoustical Society of America.
[70] Albert S. Bregman,et al. The Auditory Scene. (Book Reviews: Auditory Scene Analysis. The Perceptual Organization of Sound.) , 1990 .
[71] Ann K. Syrdal,et al. Preselection of candidate units in a unit selection-based text-to-speech synthesis system , 2000, INTERSPEECH.
[72] R. Likert,et al. New Patterns of Management. , 1963 .
[73] Hu Peng,et al. A concatenative Mandarin TTS system without prosody model and prosody modification , 2001, SSW.
[74] F. Jelinek,et al. Continuous speech recognition by statistical methods , 1976, Proceedings of the IEEE.
[75] Alan W. Black,et al. Unit selection in a concatenative speech synthesis system using a large speech database , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.
[76] Anne Rogers,et al. Parallel Speech Recognition , 2004, International Journal of Parallel Programming.
[77] Joseph Polifroni,et al. Formal and natural language generation in the Mercury conversational system , 2000, INTERSPEECH.
[78] Alan W. Black,et al. Limited domain synthesis , 2000, INTERSPEECH.
[79] G. E. Peterson,et al. Segmentation Techniques in Speech Synthesis , 1958 .
[80] Victor Zue,et al. Mokusei: a telephone-based Japanese conversational system in the weather domain , 2001, INTERSPEECH.
[81] David R. Williams,et al. Synthesis of initial (/s/-) stop-liquid clusters using HLsyn , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.
[82] Victor Zue,et al. A model of lexical access from partial phonetic information , 1984, ICASSP.
[83] J.P.H. van Santen,et al. Compression of acoustic inventories using asynchronous interpolation , 2002, Proceedings of 2002 IEEE Workshop on Speech Synthesis, 2002..
[84] Alexander Kain,et al. High-resolution voice transformation , 2001 .
[85] Yannis Stylianou,et al. Perceptual and objective detection of discontinuities in concatenative speech synthesis , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).
[86] Eric Brill,et al. Deducing linguistic structure from the statistics of large corpora , 1990 .
[87] Jörn Ostermann,et al. Multimodal speech synthesis , 2000, 2000 IEEE International Conference on Multimedia and Expo. ICME2000. Proceedings. Latest Advances in the Fast Changing World of Multimedia (Cat. No.00TH8532).
[88] David B. Pisoni,et al. Text-to-speech: the mitalk system , 1987 .
[89] Philip C. Woodland,et al. Automatic speech synthesiser parameter estimation using HMMs , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.
[90] James L. McClelland,et al. Parallel distributed processing: explorations in the microstructure of cognition, vol. 1: foundations , 1986 .
[91] Mari Ostendorf,et al. Efficient integrated response generation from multiple targets using weighted finite state transducers , 2002, Comput. Speech Lang..
[92] Paul Taylor,et al. Speech synthesis by phonological structure matching , 1999, EUROSPEECH.
[93] Min Tang,et al. Voice transformations: from speech synthesis to mammalian vocalizations , 2001, INTERSPEECH.
[94] James R. Glass,et al. Heterogeneous measurements and multiple classifiers for speech recognition , 1998, ICSLP.
[95] H. Kucera,et al. Computational analysis of present-day American English , 1967 .
[96] Rajeev Dujari,et al. Parallel Viterbi search algorithm for speech recognition , 1992 .
[97] J. Pierrehumbert. The phonology and phonetics of English intonation , 1987 .
[98] Lalit R. Bahl,et al. Design of a linguistic statistical decoder for the recognition of continuous speech , 1975, IEEE Trans. Inf. Theory.
[99] Werner Verhelst,et al. An overlap-add technique based on waveform similarity (WSOLA) for high quality time-scale modification of speech , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.
[100] Xuejing Sun. F0 generation for speech synthesis using a multi-tier approach , 2002, INTERSPEECH.
[101] Alan W. Black,et al. Generating F/sub 0/ contours from ToBI labels using linear regression , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.
[102] Shinya Nakajima. Automatic synthesis unit generation for English speech synthesis based on multi-layered context oriented clustering , 1994, Speech Commun..
[103] J. Makhoul,et al. Linear prediction: A tutorial review , 1975, Proceedings of the IEEE.
[104] Stephen Isard,et al. Optimal coupling of diphones , 1994, SSW.
[105] Paul Taylor,et al. Automatically clustering similar units for unit selection in speech synthesis , 1997, EUROSPEECH.
[106] Richard Sproat. Multilingual text analysis for text-to-speech synthesis , 1996, Nat. Lang. Eng..
[107] Stephanie Seneff,et al. Response planning and generation in the MERCURY flight reservation system , 2002, Comput. Speech Lang..
[108] Adam L. Berger,et al. A Maximum Entropy Approach to Natural Language Processing , 1996, CL.
[109] D. Pisoni,et al. Speech perception without traditional speech cues. , 1981, Science.
[110] Alex Acero,et al. Automatic generation of synthesis units for trainable text-to-speech systems , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).
[111] John H. L. Hansen,et al. A comparison of spectral smoothing methods for segment concatenation based speech synthesis , 2002, Speech Commun..
[112] Hideki Noda,et al. A MRF-based parallel processing algorithm for speech recognition using linear predictive HMM , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.
[113] Yoshinori Sagisaka,et al. Concatenative speech synthesis by minimum distortion criteria , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.
[114] Yannis Stylianou,et al. Voice selection for speech synthesis , 1997 .
[115] D. Rubin,et al. Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .
[116] Bernd Möbius,et al. Rare Events and Closed Domains: Two Delicate Concepts in Speech Synthesis , 2003, Int. J. Speech Technol..
[117] M. Portnoff,et al. Time-scale modification of speech based on short-time Fourier analysis , 1981 .
[118] Richard Sproat,et al. High-accuracy automatic segmentation , 1999, EUROSPEECH.
[119] Alan W. Black,et al. Optimal data selection for unit selection synthesis , 2001, SSW.
[120] Shrikanth S. Narayanan,et al. Expressive speech synthesis using a concatenative synthesizer , 2002, INTERSPEECH.
[121] S. Seneff. System to independently modify excitation and/Or spectrum of speech waveform without explicit pitch extraction , 1982 .
[122] Gregory A. Sanders,et al. Darpa Communicator Evaluation: Progress from 2000 to 2001 Darpa Communicator Evaluation: Progress from 2000 to 2001 , 2022 .
[123] R. I. Damper,et al. Stochastic phonographic transduction for English , 1996, Comput. Speech Lang..
[124] Stephanie Seneff,et al. GENESIS-II: a versatile system for language generation in conversational system applications , 2000, INTERSPEECH.
[125] Tien-Lok Jonathan Lau. SLLS: An Online Conversational Spoken Language Learning System , 2003 .
[126] Eric Moulines,et al. Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones , 1989, Speech Commun..
[127] Michael K. McCandless,et al. SAPPHIRE: an extensible speech analysis and recognition tool based on Tcl/Tk , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.
[128] Hu Peng,et al. Domain adaptation for TTS systems , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.