Integration of spoken dialogue system and ubiquitous computing

With the progress of ubiquitous computing, computers/machines can understand various human contexts via various sensors. A wearable device is possible to estimate calories burned, fatigue degree, even QoL (Quality of Life) by analyzing the heart rate, steps, sleep quality, etc. Simultaneously, the significant progress of deep learning brought the drastic performance improvement in not only image recognition, but also speech processing and natural language. Nowadays, it will become a reality that the humanoid robot instantaneously recognizes what is shown in the camera image and speaks the human-like sentences with human-like voice in multiple languages. Therefore, a collaboration between human and machines have already started. In a call center, AI chatbot has already worked to handle the typical Inquiries on behalf of human operators. A smartwatch and activity trackers keep monitoring owner's physical states and sometimes make an intervention for improving the owner's health. We are also developing the digital signage that persuades the passing person to change his/her behavior to a better way. However, there is still a distance between actual human-to-human interaction and machine-to-human interaction. That means that there is some context information that the machine side is not yet aware of. For example, while human beings observe a slight change of facial expression and body gestures, they change their way of talking and tone, but the machine can not take such information (emotion, agreement, etc.) into consideration when it generates a dialogue. In my keynote, I would like to widely introduce the leading-edge research on context recognition in the research area of ubiquitous computing. Then, I explain the requirements on how next - generation dialogue system should be. In the next-generation dialogue system, it is natural to change the content of conversation and the state of utterance according to the recognized context, for example, the conversation content will change according to the number of steps and stress situation during dialogue. Finally, we discuss technical issues required for integrating spoken dialogue system with ubiquitous computing.