A method of unit preselection for speech synthesis based on acoustic clustering and decision trees

The article presents a method to preselect units for concatenative speech synthesis. The method is based on a procedure of unsupervised acoustic clustering that is coupled with a decision tree for each type of unit. During synthesis, the trees predict the acoustic classes for a given symbolic target and the units belonging to the predicted classes are retained as candidates for the final selection. The units used in this study are phones and diphones, although the methodology is entirely automatic and may be applied to any type of unit or language. The proposed method is evaluated by comparison to a handcrafted method in a formal listening test.