论文信息 - Limitations to concatenative speech synthesis

Limitations to concatenative speech synthesis

This paper discusses techniques for determining the linguistic needs for open-domain synthesis by concatenative methods, and reports on the design and evaluation of a tool for collecting and balancing a speech corpus automatically, in order to ensure optimal coverage of the sounds required for synthesis within a given task-domain. Syntheticallygenerated utterances are used to prompt speakers, and in-line acoustic analysis determines the prosodic as well as phonemic balance of the resulting speech during recording, re-prompting the speaker with textually modi ed versions if necessary, to elicit the desired articulation sequences. The closed-loop process, which incorporates human self-correction and evaluation, allows for more e cient collection of a balanced corpus for concatenative speech synthesis.

Nick Campbell

[1] Y. Sagisaka,et al. Speech synthesis by rule using an optimal selection of non-uniform synthesis units , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[2] Nick Campbell. Talking Machines for Information Access , 1999 .

[3] Joseph Olive,et al. A scheme for concatenating units for speech synthesis , 1980, ICASSP.

[4] Frédéric Bimbot,et al. Introducing statistical dependencies and structural constraints in variable-length sequence models , 1996, ICGI.

[5] Tomohisa Hirokawa. Speech synthesis using a waveform dictionary , 1989, EUROSPEECH.

[6] Jonathan Allen,et al. Text to Speech , 2015 .