Evaluating the effectiveness of dialogue for an automated spoken questionnaire

We present and apply an empirical methodology for evaluating the effectiveness of dialogues in spoken language systems. This methodology is suitable in particular for evaluation of dialogue-based systems that collect information from the user, such as an automated spoken questionnaire. Our method for assessing effectiveness involves coding answers from users for responsiveness. For this effort, we developed a behavioral coding scheme tailored to the requirements of automated spoken questionnaires interacting via the telephone. The codes cover a range of behavior from “Concise” to “No response.” We have used this evaluation methodology in the development of an automated spoken questionnaire for the U.S. Census. In this project, we collected over 4,000 telephone calls responding to the questionnaire. A sample of the calls was transcribed and coded using our behavioral coding scheme. We then used the data from the codes to choose among alternative protocols for the dialogue and to evaluate differences in system voice, such as natural versus synthetic and male versus female. In particular, we illustrate the utility of our methodology by testing the hypothesis that a synthesized system voice would elicit more constrained user responses than a human voice and report the evaluation results.