An investigation of how to design control parameters for statistical voice timbre control

Multiple-regression Gaussian mixture models (MR-GMM) allow for control of voice timbre along several axes each described by a voice timbre expression word. To create these axes, perceptual scores corresponding to multiple voice timbre expression words are manually assigned to individual pre-stored target speakers as the voice timbre control parameters, and then acoustic basis vectors corresponding to the individual control parameters are learned. The voice timbre expression words are usually selected from various words using factor analysis so that the voice timbre control parameters are independent of each other. However, the resulting basis vectors are not often orthogonal to each other, and they practically cause difficulties in intuitively controlling the converted voice timbre. Towards the development of the MR-GMM capable of intuitively controlling converted voice timbre, we investigate how to design the voice timbre control parameters so that not only the voice timbre control parameters but also the corresponding acoustic basis vectors are independent of each other. Experimental results demonstrate that 1) a method for annotation of the voice timbre control parameters using the converted voices rather than natural voices is effective, and 2) the independences of the voice timbre control parameters and acoustic basis vectors is helpful for improving the converted voice timbre controllability of the MR-GMM.