Generation of multiple synthesis inventories by a bootstrapping procedure

In concatenative speech synthesis systems, the generation of a unit inventory is a tedious task. However, some applications demand multiple voices. A semiautomatic method to generate unit inventories is proposed. The units are segmented out of carrier phrases by means of dynamic time warping alignment with a synthesized utterance. This requires at least one existing inventory. The availability of several existing inventories will improve the likelihood of finding one with similar voice characteristics, which will improve the accuracy of results. The method is a bootstrapping procedure. To choose the best segmentation out of a set (e.g. aligned with each voice already implemented), a penalty system was developed that uses timing constraints. The results were compared with manually corrected segmentations and show the validity of this approach.