Coupling dialogue and prosody computation in spoken dialogue generation

We introduce a concept-to-speech (CTS) system that generates prosodic structure compositionally, in a spoken dialogue agent architecture . Representations from the semantic interpretation, task modeling, and dialogue strategy selection components drive the computation of accentuation, pitch accent type selection, and choice of melodic contour, respectively. These principled couplings of dialogue and prosody computation extend both the theory and practice of concept-to-speech generation.