Duration modeling and memory optimization in a Mandarin TTS system
暂无分享,去创建一个
Current speech synthesis efforts, both in research and in applications, are dominated by methods based on concatenation of spoken units. New progress in the concatenative text-to-speech (TTS) technology can be made mainly from two directions, either by reducing the memory footprint to integrate the system into embedded system, or by improving the synthesized speech quality in terms of intelligibility and naturalness. In this paper, we are focusing on the memory footprint reduction in a Mandarin TTS system. We show that significant memory reductions can be achieved through duration modeling and memory optimization of the lexicon data. The results obtained in the experiments indicate that the memory requirements of the duration data and lexicon can be significantly reduced while keeping the speech quality unaffected. For practical embedded implementations, this is a significant step towards an efficient TTS engine implementation. The applicability of the approach is verified in the speech synthesis system.
[1] Alex Acero,et al. Spoken Language Processing: A Guide to Theory, Algorithm and System Development , 2001 .
[2] Imre Kiss,et al. Speaker- and language-independent speech recognition in mobile communication systems , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).
[3] Jani Nurminen,et al. On analysis of eigenpitch in Mandarin Chinese , 2004, 2004 International Symposium on Chinese Spoken Language Processing.