Generation of multiple synthesis inventories by a bootstrapping procedure
暂无分享,去创建一个
In concatenative speech synthesis systems, the generation of a unit inventory is a tedious task. However, some applications demand multiple voices. A semiautomatic method to generate unit inventories is proposed. The units are segmented out of carrier phrases by means of dynamic time warping alignment with a synthesized utterance. This requires at least one existing inventory. The availability of several existing inventories will improve the likelihood of finding one with similar voice characteristics, which will improve the accuracy of results. The method is a bootstrapping procedure. To choose the best segmentation out of a set (e.g. aligned with each voice already implemented), a penalty system was developed that uses timing constraints. The results were compared with manually corrected segmentations and show the validity of this approach.
[1] Florian Schiel,et al. Applying speech verification to a large data base of German to obtain a statistical survey about rules of pronunciation , 1994, ICSLP.
[2] Wolfgang Hess,et al. Structure and representation of an inventory for German speech synthesis , 1994, ICSLP.