Active physical inference via reinforcement learning

When encountering unfamiliar physical objects, children and adults often perform structured interrogatory actions such as grasping and prodding, so revealing latent physical properties such as masses and textures. However, the processes driving and supporting these curious behaviors are still largely mysterious. In this paper, we develop and train an agent able to actively uncover latent physical properties such as the mass and force of objects in a simulated physical “micro-world’. Concretely, we used a simulation-based-inference framework to quantify the physical information produced by observation and interaction with the evolving dynamic environment. We used model-free reinforcement learning algorithm to train an agent to implement general strategies for revealing latent physical properties. We compare the behaviors of this agent to the human behaviors observed in a similar task.

[1]  Claude E. Shannon,et al.  Prediction and Entropy of Printed English , 1951 .

[2]  Alex Graves,et al.  Recurrent Models of Visual Attention , 2014, NIPS.

[3]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[4]  Pierre-Yves Oudeyer,et al.  Information-seeking, curiosity, and attention: computational and neural mechanisms , 2013, Trends in Cognitive Sciences.

[5]  George A. Alvarez,et al.  Explaining human multiple object tracking as resource-constrained approximate inference in a dynamic probabilistic model , 2009, NIPS.

[6]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[7]  Nuttapong Chentanez,et al.  Intrinsically Motivated Reinforcement Learning , 2004, NIPS.

[8]  Misha Denil,et al.  Learning to Perform Physics Experiments via Deep Reinforcement Learning , 2016, ICLR.

[9]  Peter Dayan,et al.  Efficient Bayes-Adaptive Reinforcement Learning using Sample-Based Search , 2012, NIPS.

[10]  Samuel J. Gershman,et al.  The algorithmic architecture of exploration in the human brain , 2019, Current Opinion in Neurobiology.

[11]  Jürgen Schmidhuber,et al.  Formal Theory of Creativity, Fun, and Intrinsic Motivation (1990–2010) , 2010, IEEE Transactions on Autonomous Mental Development.

[12]  Richard S. Sutton,et al.  Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[13]  Xiaogang Wang,et al.  Deep Learning Face Attributes in the Wild , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[14]  W. R. Thompson ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .

[15]  H. Raiffa,et al.  Applied Statistical Decision Theory. , 1961 .

[16]  Maarten Speekenbrink,et al.  Strategic exploration in human adaptive control , 2017, bioRxiv.

[17]  Daniel L. K. Yamins,et al.  Learning to Play with Intrinsically-Motivated Self-Aware Agents , 2018, NeurIPS.

[18]  Jan Peters,et al.  Reinforcement learning in robotics: A survey , 2013, Int. J. Robotics Res..

[19]  Neil R. Bramley,et al.  Intuitive experimentation in the physical world , 2018, Cognitive Psychology.

[20]  Pierre-Yves Oudeyer,et al.  Intrinsic Motivation Systems for Autonomous Mental Development , 2007, IEEE Transactions on Evolutionary Computation.

[21]  Philip Bachman,et al.  Towards Information-Seeking Agents , 2016, ArXiv.

[22]  Alexei A. Efros,et al.  Curiosity-Driven Exploration by Self-Supervised Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).