Predicting Character-Appropriate Voices for a TTS-based Storyteller System

Using distinct and appropriate synthetic voices to voice the characters in a children’s story would make a TTS-based digital storyteller system more engaging and entertaining, as well increase listener’s comprehension of the story. However, automatically predicting appropriate voices for storybook characters is both a non-trivial and largely unexplored problem. In this paper, we present a data-driven approach for predicting the most appropriate voices for characters in children’s stories based on salient character attributes. We use Mechanical Turk to identify the character attributes that are most salient in evoking the listeners’ perception that a specific character should have a particular voice, and to label the voices in our collection with attribute tags. We model the attribute-to-voice relationship with Naive Bayes. The resulting system performs significantly above chance in an objective evaluation, demonstrating the viability of our approach.