Identifying Utterances Addressed to an Agent in Multiparty Human-Agent Conversations

In multiparty human-agent interaction, the agent should be able to properly respond to a user by determining whether the utterance is addressed to the agent or to another person. This study proposes a model for predicting the addressee by using the acoustic information in speech and head orientation as nonverbal information. First, we conducted a Wizard-of-Oz (WOZ) experiment to collect human-agent triadic conversations. Then, we analyzed whether the acoustic features and head orientations were correlated with addressee-hood. Based on the analysis, we propose an addressee prediction model that integrates acoustic and bodily nonverbal information using SVM.