论文信息 - Goal-Directed Planning for Habituated Agents by Active Inference Using a Variational Recurrent Neural Network

Goal-Directed Planning for Habituated Agents by Active Inference Using a Variational Recurrent Neural Network

It is crucial to ask how agents can achieve goals by generating action plans using only partial models of the world acquired through habituated sensory-motor experiences. Although many existing robotics studies use a forward model framework, there are generalization issues with high degrees of freedom. The current study shows that the predictive coding (PC) and active inference (AIF) frameworks, which employ a generative model, can develop better generalization by learning a prior distribution in a low dimensional latent state space representing probabilistic structures extracted from well habituated sensory-motor trajectories. In our proposed model, learning is carried out by inferring optimal latent variables as well as synaptic weights for maximizing the evidence lower bound, while goal-directed planning is accomplished by inferring latent variables for maximizing the estimated lower bound. Our proposed model was evaluated with both simple and complex robotic tasks in simulation, which demonstrated sufficient generalization in learning with limited training data by setting an intermediate value for a regularization coefficient. Furthermore, comparative simulation results show that the proposed model outperforms a conventional forward model in goal-directed planning, due to the learned prior confining the search of motor plans within the range of habituated trajectories.

Jun Tani | Takazumi Matsumoto | J. Tani | Takazumi Matsumoto

[1] Max Welling,et al. Auto-Encoding Variational Bayes , 2013, ICLR.

[2] Gordon Cheng,et al. Active inference body perception and action for humanoid robots , 2019, ArXiv.

[3] Jun Tani,et al. Goal-Directed Behavior under Variational Predictive Coding: Dynamic organization of Visual Attention and Working Memory , 2019, 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[4] Fady Alnajjar,et al. Unfulfilled prophecies in sport performance: Active inference and the choking effect , 2020 .

[5] B. Jones. BOUNDED RATIONALITY , 1999 .

[6] Jun Tani,et al. Self-organization of behavioral primitives as multiple attractor dynamics: A robot experiment , 2003, IEEE Trans. Syst. Man Cybern. Part A.

[7] Karl J. Friston,et al. A theory of cortical responses , 2005, Philosophical Transactions of the Royal Society B: Biological Sciences.

[8] Jun Tani,et al. A Novel Predictive-Coding-Inspired Variational RNN Model for Online Prediction and Recognition , 2018, Neural Computation.

[9] Karl J. Friston,et al. Action and behavior: a free-energy formulation , 2010, Biological Cybernetics.

[10] Martin V. Butz,et al. Learning, Planning, and Control in a Monolithic Neural Event Inference Architecture , 2018, Neural Networks.

[11] Jun Tani,et al. Model-based learning for mobile robot navigation from the dynamical systems perspective , 1996, IEEE Trans. Syst. Man Cybern. Part B.

[12] Xavier Gabaix,et al. A Sparsity-Based Model of Bounded Rationality , 2011 .

[13] Robert Gray,et al. Why do athletes choke under pressure , 2012 .

[14] Karl J. Friston,et al. Reinforcement Learning or Active Inference? , 2009, PloS one.

[15] Karl J. Friston,et al. A free energy principle for the brain , 2006, Journal of Physiology-Paris.

[16] Jürgen Schmidhuber,et al. World Models , 2018, ArXiv.

[17] Jun Tani,et al. Learning Multiple Goal-Directed Actions Through Self-Organization of a Dynamic Neural Network Model: A Humanoid Robot Experiment , 2008, Adapt. Behav..

[18] Mitsuo Kawato,et al. Internal models for motor control and trajectory planning , 1999, Current Opinion in Neurobiology.

[19] Jun Tani,et al. Emergence of Functional Hierarchy in a Multiple Timescale Neural Network Model: A Humanoid Robot Experiment , 2008, PLoS Comput. Biol..

[20] Rajesh P. N. Rao,et al. Predictive coding in the visual cortex: a functional interpretation of some extra-classical receptive-field effects. , 1999 .

[21] Randall D. Beer,et al. On the Dynamics of Small Continuous-Time Recurrent Neural Networks , 1995, Adapt. Behav..

[22] M. Nour. Surfing Uncertainty: Prediction, Action, and the Embodied Mind. , 2017, British Journal of Psychiatry.

[23] Pierre-Yves Oudeyer,et al. Intrinsic Motivation Systems for Autonomous Mental Development , 2007, IEEE Transactions on Evolutionary Computation.

[24] J. Hohwy. The Predictive Mind , 2013 .

[25] Yoshua Bengio,et al. A Recurrent Latent Variable Model for Sequential Data , 2015, NIPS.

[26] Ruben Villegas,et al. Learning Latent Dynamics for Planning from Pixels , 2018, ICML.

[27] Michael I. Jordan. Attractor dynamics and parallelism in a connectionist sequential machine , 1990 .

[28] M. Kawato,et al. Trajectory formation of arm movement by cascade neural network model based on minimum torque-change criterion , 1990, Biological Cybernetics.

[29] Karl J. Friston,et al. The Markov blankets of life: autonomy, active inference and the free energy principle , 2018, Journal of The Royal Society Interface.

[30] Jun Tani,et al. Generating goal-directed visuomotor plans based on learning using a predictive coding type deep visuomotor recurrent neural network model , 2018, ArXiv.

[31] Tai Sing Lee,et al. Hierarchical Bayesian inference in the visual cortex. , 2003, Journal of the Optical Society of America. A, Optics, image science, and vision.

[32] Daniel M. Wolpert,et al. Forward Models for Physiological Motor Control , 1996, Neural Networks.

[33] Karl J. Friston,et al. Hierarchical Active Inference: A Theory of Motivated Control , 2018, Trends in Cognitive Sciences.

[34] Pierre-Yves Oudeyer,et al. Intrinsically Motivated Goal Exploration Processes with Automatic Curriculum Learning , 2017, J. Mach. Learn. Res..

[35] Karl J. Friston,et al. Action understanding and active inference , 2011, Biological Cybernetics.

[36] Simon McGregor,et al. The free energy principle for action and perception: A mathematical review , 2017, 1705.09156.

[37] Shigeki Sugano,et al. CREATING NOVEL GOAL-DIRECTED ACTIONS AT CRITICALITY: A NEURO-ROBOTIC EXPERIMENT , 2009 .

[38] Stefano Nolfi,et al. Learning to perceive the world as articulated: an approach for hierarchical learning in sensory-motor systems , 1998, Neural Networks.

[39] Karl J. Friston,et al. Does predictive coding have a future? , 2018, Nature Neuroscience.