Preselection of candidate units in a unit selection-based text-to-speech synthesis system

Unit selection-based speech synthesis has recently been the focus of much attention in the speech synthesis community. In general, the speech quality from such a system achieves a high degree of naturalness and good intelligibility. However, examining and selecting units for synthesis as a runtime operation makes the unit selection process computationally expensive. Considerable attention has been focused on reducing the complexity of unit selection while maintaining quality. Previous approaches to speeding up the process of runtime unit selection have focused on two aspects. (1) By limiting the number of candidate synthesis units considered in the unit selection process, the number of calculations required can be reduced. (2) By precomputing part of the needed calculations, the runtime complexity can be reduced. Much progress has been made using these methods, but usually at the expense of quality. We present two methods of reducing the complexity of the calculation that avoid any reduction in synthesis quality, while allowing a very fast unit selection process. Results are presented for the reduction in complexity of the calculation process, and for a perceptual experiment that shows quality is not reduced relative to a full unit selection process.