Voice transformations: from speech synthesis to mammalian vocalizations

This paper describes a phase vocoder based technique for voice transformation. This method provides a fle xible way to manipulate various aspects of the input signal, e.g., fundamental frequency of voicing, duration, energy, and formant positions, without explicit extraction. The modifications to the signal can be specific to any feature dimensions, and can vary dynamically over time. There are many potential applications for this technique. In concatenative speech synthesis, the method can be applied to transform the speech corpus to different voice characteristics, or to smooth any pitch or formant discontinuities between concatenation boundaries. The method can also be used as a tool for language learning. We can modify the prosody of the student’ s own speech to match that from a native speaker, and use the result as guidance for improvements. The technique can also be used to convert other biological signals, such as killer whale vocalizations, to a signal that is more appropriate for human auditory perception. Our initial experiments show encouraging results for all of these applications.