VocaliD: personalizing text-to-speech synthesis for individuals with severe speech impairment

Speech synthesis options on assistive communication devices are very limited and do not reflect the user's vocal quality or personality. Previous work suggests that speakers with severe speech impairment can control prosodic aspects of their voice, and often retain the ability to produce sustained vowel-like utterances. This project leverages these residual phonatory abilities in order to build an adaptive text-to-speech synthesizer that is intelligible, yet conveys the user's vocal identity. Our VocaliD system combines the source characteristics of the disordered speaker with the filter characteristics of an age-matched healthy speaker using voice transformation techniques, in order to produce a personalized voice. Usability testing indicated that listeners were 94% accurate in transcribing morphed samples and 79.5% accurate in matching morphed samples from the same speaker.