Controlling Voice Source Parameters to Transform Characteristics of Synthetic Voices

The success of the speech communication process depends not only on the intelligibility of the speech transmitted to the listener but also on how the message is spoken. An important aspect that carries underlying information in speech besides the linguistic content is the type of voice. Humans change instinctively or intentionally their voice depending on their mood, the environment, the listener, the feelings they want to transmit, etc. In order to take advantage of this valuable aspect of speech in applications of synthetic voices for spoken communication is necessary that the computer can produce a high variability of voices and that it can predict an appropriate voice based on environmental cues, including feedback information about the listener. This work fits into the subject of modelling and transforming acoustic aspects of speech for controlling the type of synthetic voice. The goal is to accurately model an important acoustic component of speech related to voice characteristics which is aspiration noise. This noise signal results from the turbulence of air passing through the glottis during human speech production. It can be represented by an amplitude modulated Gaussian noise, which depends on the glottal volume velocity and glottal area. For example, this modulation effect is more important in breathy voice than modal because the vocal folds usually do not completely close for breathy unlike modal.