Efficient weight training for selection based synthesis

In order to train the weights w j and w C j , an automatic scheme is suggested [1] [2], which uses an objective acoustic distance between cepstral vectors of an original utterance and its time aligned, re-synthesized version. By performing an exhaustive search on many weight combinations, the weight combination which produces the synthetic version closest to the natural speech is selected. Due to the large number of parameters, this is very computation intensive (hundreds of CPU hours).