See, feel, act: Hierarchical learning for complex manipulation skills with multisensory fusion

A robot leverages hierarchical Bayesian representation with multisensory observations to play the game of Jenga. Humans are able to seamlessly integrate tactile and visual stimuli with their intuitions to explore and execute complex manipulation skills. They not only see but also feel their actions. Most current robotic learning methodologies exploit recent progress in computer vision and deep learning to acquire data-hungry pixel-to-action policies. These methodologies do not exploit intuitive latent structure in physics or tactile signatures. Tactile reasoning is omnipresent in the animal kingdom, yet it is underdeveloped in robotic manipulation. Tactile stimuli are only acquired through invasive interaction, and interpretation of the data stream together with visual stimuli is challenging. Here, we propose a methodology to emulate hierarchical reasoning and multisensory fusion in a robot that learns to play Jenga, a complex game that requires physical interaction to be played effectively. The game mechanics were formulated as a generative process using a temporal hierarchical Bayesian model, with representations for both behavioral archetypes and noisy block states. This model captured descriptive latent structures, and the robot learned probabilistic models of these relationships in force and visual domains through a short exploration phase. Once learned, the robot used this representation to infer block behavior patterns and states as it played the game. Using its inferred beliefs, the robot adjusted its behavior with respect to both its current actions and its game strategy, similar to the way humans play the game. We evaluated the performance of the approach against three standard baselines and show its fidelity on a real-world implementation of the game.

[1]  Jitendra Malik,et al.  Learning to Poke by Poking: Experiential Learning of Intuitive Physics , 2016, NIPS.

[2]  Stefan Schaal,et al.  Combining learned and analytical models for predicting action effects from sensory data , 2017, Int. J. Robotics Res..

[3]  J. Tenenbaum,et al.  Mind Games: Game Engines as an Architecture for Intuitive Physics , 2017, Trends in Cognitive Sciences.

[4]  A. Ng,et al.  Touch Based Perception for Object Manipulation , 2007 .

[5]  Jessica B. Hamrick,et al.  Simulation as an engine of physical scene understanding , 2013, Proceedings of the National Academy of Sciences.

[6]  Ashutosh Saxena,et al.  Learning haptic representation for manipulating deformable food objects , 2014, 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[7]  Sergey Levine,et al.  End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..

[8]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[9]  Yuval Tassa,et al.  MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[10]  Katherine D. Kinzler,et al.  Core knowledge. , 2007, Developmental science.

[11]  Yang Gao,et al.  Deep learning for tactile understanding from visual and haptic data , 2015, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[12]  John G. Mikhael,et al.  Functional neuroanatomy of intuitive physical inference , 2016, Proceedings of the National Academy of Sciences.

[13]  Sergey Levine,et al.  Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection , 2016, Int. J. Robotics Res..

[14]  Nima Fazeli,et al.  Fundamental Limitations in Performance and Interpretability of Common Planar Rigid-Body Contact Models , 2017, ISRR.

[15]  Marc Peter Deisenroth,et al.  Deep Reinforcement Learning: A Brief Survey , 2017, IEEE Signal Processing Magazine.

[16]  Robert A. Jacobs,et al.  Transfer of object shape knowledge across visual and haptic modalities , 2014, CogSci.

[17]  Konrad Paul Kording,et al.  Bayesian integration in sensorimotor learning , 2004, Nature.

[18]  M. T. Mason,et al.  Toward Robotic Manipulation , 2018, Annu. Rev. Control. Robotics Auton. Syst..

[19]  Aude Billard,et al.  On Learning, Representing, and Generalizing a Task in a Humanoid Robot , 2007, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[20]  Kerstin Küsters See, feel, act: Hierarchical learning for complex manipulation skills with multisensory fusion , 2019 .

[21]  M. Ernst,et al.  Humans integrate visual and haptic information in a statistically optimal fashion , 2002, Nature.

[22]  Kaiming He,et al.  Mask R-CNN , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[23]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  J. Randall Flanagan,et al.  Coding and use of tactile signals from the fingertips in object manipulation tasks , 2009, Nature Reviews Neuroscience.

[25]  Kenneth Y. Goldberg,et al.  Learning Deep Policies for Robot Bin Picking by Simulating Robust Grasping Sequences , 2017, CoRL.