Active Perception based on Energy Minimization in Multimodal Human-robot Interaction

Humans use various types of modalities to express own internal states. If a robot interacting with humans can pay attention to limited signals, it should select more informative ones to estimate the partners' states. We propose an active perception method that controls the robot's attention based on an energy minimization criterion. An energy-based model, which has learned to estimate the latent state from sensory signals, calculates energy values corresponding to occurrence probabilities of the signals; The lower the energy is, the higher the likelihood of them. Our method therefore selects the modality that provides the lowest expectation energy among available ones to exploit more frequent experiences. We employed a multimodal deep belief network to represent relationships between humans' states and expressions. Our method demonstrated better performance for the modality selection than other methods in a task of emotion estimation. We discuss the potential of our method to advance human-robot interaction.

[1]  Minoru Asada,et al.  Imitation of human expressions based on emotion estimation by mental simulation , 2016, Paladyn J. Behav. Robotics.

[2]  Yoshua Bengio,et al.  Deep Directed Generative Models with Energy-Based Probability Estimation , 2016, ArXiv.

[3]  Geoffrey E. Hinton A Practical Guide to Training Restricted Boltzmann Machines , 2012, Neural Networks: Tricks of the Trade.

[4]  Carlos Busso,et al.  IEMOCAP: interactive emotional dyadic motion capture database , 2008, Lang. Resour. Evaluation.

[5]  Minoru Asada,et al.  Touch and emotion: Modeling of developmental differentiation of emotion lead by tactile dominance , 2013, 2013 IEEE Third Joint International Conference on Development and Learning and Epigenetic Robotics (ICDL).

[6]  Nitish Srivastava,et al.  Modeling Documents with Deep Boltzmann Machines , 2013, UAI.

[7]  Yutaka Sakaguchi,et al.  Haptic sensing system with active perception , 1993, Adv. Robotics.

[8]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[9]  Nitish Srivastava,et al.  Learning Representations for Multimodal Data with Deep Belief Nets , 2012 .

[10]  Tadahiro Taniguchi,et al.  Multimodal Hierarchical Dirichlet Process-based Active Perception , 2015 .

[11]  Tapani Raiko,et al.  Improved Learning of Gaussian-Bernoulli Restricted Boltzmann Machines , 2011, ICANN.

[12]  Geoffrey E. Hinton,et al.  Reinforcement Learning with Factored States and Actions , 2004, J. Mach. Learn. Res..