Dialogue Control Algorithm for Ambient Intelligence based on Partially Observable Markov Decision Processes

From the viewpoint of supporting users’ natural dialogue communication with conversational agents, their dialogue management has to determine any agent’s action, based on probabilistic methods derived from noisy data through sensors in the real world. We believe unique Partially Observable Markov Decision Processes (POMDPs) should be applied to such action control systems. The agents must flexibly choose their actions to reach a state suitable for the users while retaining as many statistical characteristics of the data as possible. We offer two technical points to resolve this issue. One is the automatic acquisition of POMDPs ¡̇ state transition probabilities through DBNs with a large amount of dialogue data, and the other is applying rewards from the emission probabilities of agent actions into POMDPs’ reinforcement learning. This paper proposes a method to simultaneously achieve purpose-oriented and stochastic naturalness-oriented action controls. Our experimental results demonstrate the effectiveness of our framework, which shows that the agent can generate both actions without being locked into either of them.