Pruning redundant synthesis units based on static and delta unit appearance frequency

In order to reduce the footprint of concatenative speech synthesis systems for embedded devices, a novel method for pruning redundant units is introduced in this work. Instead of using only a unit appearance frequency-based pruning criterion, as in the conventional method, the new method introduces the concept of “delta unit appearance frequency” which indicates whether a unit is replaceable or not. Both static and delta unit appearance frequency are included in this proposed method as pruning criteria. Only units with comparatively high appearance frequency and which cannot be replaced by other units are preserved in the database. Experiments show that the new method can reduce the footprint of our speech synthesis system greatly without losing much synthesis voice quality.

[1]  Keikichi Hirose,et al.  Pruning of redundant synthesis instances based on weighted vector quantization , 2001, INTERSPEECH.

[2]  Simon King,et al.  Using Bayesian Networks to find relevant context features for HMM-based speech synthesis , 2012, INTERSPEECH.

[3]  Paul Taylor,et al.  Automatically clustering similar units for unit selection in speech synthesis , 1997, EUROSPEECH.

[4]  Jerome R. Bellegarda,et al.  Unit-Centric Feature Mapping for Inventory Pruning in Unit Selection Text-to-Speech Synthesis , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[5]  Simon King,et al.  The Blizzard Challenge 2009 , 2009 .

[6]  Ren-Hua Wang,et al.  The USTC System for Blizzard Challenge 2010 , 2008 .

[7]  Simon King,et al.  The Blizzard Challenge 2008 , 2008 .

[8]  Alan W. Black,et al.  Unit selection in a concatenative speech synthesis system using a large speech database , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[9]  Ann K. Syrdal,et al.  Expanding phonetic coverage in unit selection synthesis through unit substitution from a donor voice , 2006, INTERSPEECH.

[10]  Matthew P. Aylett,et al.  A statistically motivated database pruning technique for unit selection synthesis , 2002, INTERSPEECH.

[11]  Marc C. Beutnagel,et al.  The AT & T NEXT-GEN TTS system , 1999 .

[12]  Aimilios Chalamandaris,et al.  A statistical method for database reduction for embedded unit selection speech synthesis , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.