Multimodal Time Series Learning of Robots Based on Distributed and Integrated Modalities: Verification with a Simulator and Actual Robots

We have developed an autonomous robot motion generation model based on distributed and integrated multimodal learning. Since each modality used as a robot's senses, such as image, joint angle, and torque, has a different physical meaning and time characteristic, the generation of autonomous motions using multimodal learning has sometimes failed due to overlearning in one of the modalities. Inspired by the sensory processing of the human brain, our model is based on the processing of each sense performed in the primary somatosensory cortex and the integrated processing of multiple senses in the association cortex and the primary motor cortex. Specifically, the proposed model utilizes two types of recurrent neural networks: sensory RNNs, which learn each sense in a time series, and a union RNN, which communicates with sensory RNNs and learns sensory integration. The simulation results of multiple tasks showed that our model processes multiple modalities appropriately and generates smoother motions with lower jerk than the conventional model. We also demonstrated a chair assembly task by combining fixed motions and autonomous motions with our model.

[1]  P. Abbeel,et al.  DayDreamer: World Models for Physical Robot Learning , 2022, CoRL.

[2]  Chelsea Finn,et al.  Play it by Ear: Learning Skills amidst Occlusion through Audio-Visual Imitation Learning , 2022, Robotics: Science and Systems.

[3]  Gregory Dudek,et al.  Visuotactile-RL: Learning Multimodal Manipulation Policies with Deep Reinforcement Learning , 2022, 2022 International Conference on Robotics and Automation (ICRA).

[4]  M. Inaba,et al.  Dynamic Cloth Manipulation Considering Variable Stiffness and Material Change Using Deep Predictive Model With Parametric Bias , 2022, Frontiers in Neurorobotics.

[5]  Kenjiro Yamamoto,et al.  Efficient multitask learning with an embodied predictive model for door opening and entry with whole-body control , 2022, Science Robotics.

[6]  Shigeki Sugano,et al.  Utilization of Image/Force/Tactile Sensor Data for Object-Shape-Oriented Manipulation: Wiping Objects With Turning Back Motions and Occlusion , 2022, IEEE Robotics and Automation Letters.

[7]  Peter R. Florence,et al.  Implicit Kinematic Policies: Unifying Joint and Cartesian Action Spaces in End-to-End Robot Learning , 2022, 2022 International Conference on Robotics and Automation (ICRA).

[8]  Hiroshi Ito,et al.  Sensory-Motor Learning for Simultaneous Control of Motion and Force: Generating Rubbing Motion against Uneven Object , 2022, 2022 IEEE/SICE International Symposium on System Integration (SII).

[9]  Kenjiro Yamamoto,et al.  Contact-Rich Manipulation of a Flexible Object based on Deep Predictive Learning using Vision and Tactility , 2021, 2022 International Conference on Robotics and Automation (ICRA).

[10]  Yasuo Kuniyoshi,et al.  Transformer-based deep imitation learning for dual-arm robot manipulation , 2021, 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[11]  Rutav Shah,et al.  RRL: Resnet as representation for Reinforcement Learning , 2021, ICML.

[12]  T. Ogata,et al.  Spatial Attention Point Network for Deep-learning-based Robust Autonomous Robot Motion Generation , 2021, ArXiv.

[13]  S. Sugano,et al.  Wiping 3D-objects using Deep Learning Model based on Image/Force/Joint Information , 2020, 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[14]  Tadashi Isa,et al.  The somatosensory cortex receives information about motor output , 2019, Science Advances.

[15]  Takamitsu Matsubara,et al.  Deep reinforcement learning with smooth policy update: Application to robotic cloth manipulation , 2019, Robotics Auton. Syst..

[16]  Tsuyoshi Adachi,et al.  Imitation Learning for Object Manipulation Based on Position/Force Information Using Bilateral Control , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[17]  David Picard,et al.  2D/3D Pose Estimation and Action Recognition Using Multitask Deep Learning , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[18]  Jun Tani,et al.  Seamless Integration and Coordination of Cognitive Skills in Humanoid Robots: A Deep Learning Approach , 2017, IEEE Transactions on Cognitive and Developmental Systems.

[19]  Jan Peters,et al.  Stable reinforcement learning with autoencoders for tactile and visual data , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[20]  Sergey Levine,et al.  Deep spatial autoencoders for visuomotor learning , 2015, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[21]  Sergey Levine,et al.  End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..

[22]  Yoshua Bengio,et al.  Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.

[23]  Yuki Suga,et al.  Multimodal integration learning of robot behavior using deep neural networks , 2014, Robotics Auton. Syst..

[24]  Jun Tani,et al.  Emergence of Functional Hierarchy in a Multiple Timescale Neural Network Model: A Humanoid Robot Experiment , 2008, PLoS Comput. Biol..

[25]  S. Hochreiter,et al.  Long Short-Term Memory , 1997, Neural Computation.

[26]  L. Cohen,et al.  ‘Gating’ of somatosensory evoked potentials begins before the onset of voluntary movement in man , 1985, Brain Research.

[27]  T. Flash,et al.  The coordination of arm movements: an experimentally confirmed mathematical model , 1985, The Journal of neuroscience : the official journal of the Society for Neuroscience.