Deep reinforcement learning with smooth policy update: Application to robotic cloth manipulation

Abstract Deep Reinforcement Learning (DRL), which can learn complex policies with high-dimensional observations as inputs, e.g., images, has been successfully applied to various tasks. Therefore, it may be suitable to apply them for robots to learn and perform daily activities like washing and folding clothes, cooking, and cleaning since such tasks are difficult for non-DRL methods that often require either (1) direct access to state variables or (2) well-designed hand-engineered features extracted from sensory inputs. However, applying DRL to real robots remains very challenging because conventional DRL algorithms require a huge number of training samples for learning, which is arduous in real robots. To alleviate this dilemma, in this paper, we propose two sample efficient DRL algorithms: Deep P-Network (DPN) and Dueling Deep P-Network (DDPN). The core idea is to combine the nature of smooth policy update with the capability of automatic feature extraction in deep neural networks to enhance the sample efficiency and learning stability with fewer samples. The proposed methods were first investigated by a robot-arm reaching task in the simulation that compared previous DRL methods and applied to two real robotic cloth manipulation tasks: (1) flipping a handkerchief and (2) folding a t-shirt with a limited number of samples. All the results suggest that our method outperformed the previous DRL methods.

[1]  Morgan Quigley,et al.  ROS: an open-source Robot Operating System , 2009, ICRA 2009.

[2]  Sergey Levine,et al.  End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..

[3]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[4]  Carl E. Rasmussen,et al.  Learning to Control a Low-Cost Manipulator using Data-Efficient Reinforcement Learning , 2011, Robotics: Science and Systems.

[5]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Ben Tse,et al.  Autonomous Inverted Helicopter Flight via Reinforcement Learning , 2004, ISER.

[7]  Danfei Xu,et al.  Folding deformable objects using predictive simulation and trajectory optimization , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[8]  Stefano Panzeri,et al.  Muscle synergies in neuroscience and robotics: from input-space to task-space perspectives , 2013, Front. Comput. Neurosci..

[9]  Richard S. Sutton,et al.  Generalization in ReinforcementLearning : Successful Examples UsingSparse Coarse , 1996 .

[10]  Michail G. Lagoudakis,et al.  Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..

[11]  Vladimír Petrík,et al.  Garment perception and its folding using a dual-arm robot , 2014, 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[12]  Carme Torras,et al.  A friction-model-based framework for Reinforcement Learning of robotic tasks in non-rigid environments , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[13]  Dong Yu,et al.  Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[14]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[15]  Jan Peters,et al.  Reinforcement learning in robotics: A survey , 2013, Int. J. Robotics Res..

[16]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[17]  Shigeki Sugano,et al.  Repeatable Folding Task by Humanoid Robot Worker Using Deep Learning , 2017, IEEE Robotics and Automation Letters.

[18]  Carme Torras,et al.  Active garment recognition and target grasping point detection using deep learning , 2018, Pattern Recognit..

[19]  Danna Zhou,et al.  d. , 1934, Microbial pathogenesis.

[20]  Sergey Levine,et al.  Trust Region Policy Optimization , 2015, ICML.

[21]  Tom Schaul,et al.  Dueling Network Architectures for Deep Reinforcement Learning , 2015, ICML.

[22]  Marc Peter Deisenroth,et al.  Deep Reinforcement Learning: A Brief Survey , 2017, IEEE Signal Processing Magazine.

[23]  Peter Stone,et al.  RTMBA: A Real-Time Model-Based Reinforcement Learning Architecture for robot control , 2011, 2012 IEEE International Conference on Robotics and Automation.

[24]  Eiji Uchibe,et al.  Model-Free Deep Inverse Reinforcement Learning by Logistic Regression , 2018, Neural Processing Letters.

[25]  Trevor Darrell,et al.  A geometric approach to robotic laundry folding , 2012, Int. J. Robotics Res..

[26]  Takamitsu Matsubara,et al.  Deep dynamic policy programming for robot control with raw images , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[27]  Takashi Suehiro,et al.  Planning method for a wrapping-with-fabric task using regrasping , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[28]  Takamitsu Matsubara,et al.  Kernel dynamic policy programming: Applicable reinforcement learning to robot systems with high dimensional states , 2017, Neural Networks.

[29]  Francesc Moreno-Noguer,et al.  FINDDD: A fast 3D descriptor to characterize textiles for robot manipulation , 2013, 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[30]  T. Jung,et al.  Kernelizing LSPE(λ) , 2007, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning.

[31]  Emanuel Todorov,et al.  Linearly-solvable Markov decision problems , 2006, NIPS.

[32]  Geoffrey E. Hinton,et al.  Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[33]  Geoffrey E. Hinton,et al.  Machine Learning for Aerial Image Labeling , 2013 .

[34]  Kimitoshi Yamazaki,et al.  Home-Assistant Robot for an Aging Society , 2012, Proceedings of the IEEE.

[35]  Sethu Vijayakumar,et al.  Using dimensionality reduction to exploit constraints in reinforcement learning , 2010, 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[36]  Kimitoshi Yamazaki,et al.  Grasping point selection on an item of crumpled clothing based on relational shape description , 2014, 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[37]  Farbod Fahimi,et al.  Online human training of a myoelectric prosthesis controller via actor-critic reinforcement learning , 2011, 2011 IEEE International Conference on Rehabilitation Robotics.

[38]  Vicenç Gómez,et al.  Dynamic Policy Programming with Function Approximation , 2011, AISTATS.

[39]  Hilbert J. Kappen,et al.  Dynamic policy programming , 2010, J. Mach. Learn. Res..

[40]  Oliver Kroemer,et al.  Active Reward Learning , 2014, Robotics: Science and Systems.

[41]  Razvan Pascanu,et al.  Sim-to-Real Robot Learning from Pixels with Progressive Nets , 2016, CoRL.

[42]  Sergey Levine,et al.  Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[43]  Stefan Schaal,et al.  Reinforcement learning of motor skills in high dimensions: A path integral approach , 2010, 2010 IEEE International Conference on Robotics and Automation.

[44]  Yasemin Altun,et al.  Relative Entropy Policy Search , 2010 .

[45]  David Silver,et al.  Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.

[46]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[47]  Takamitsu Matsubara,et al.  Pneumatic artificial muscle-driven robot control using local update reinforcement learning , 2017, Adv. Robotics.

[48]  Sergey Levine,et al.  Leave no Trace: Learning to Reset for Safe and Autonomous Reinforcement Learning , 2017, ICLR.

[49]  Solvi Arnold,et al.  Unfolding of a rectangular cloth from unarranged starting shapes by a Dual-Armed robot with a mechanism for managing recognition error and uncertainty , 2017, Adv. Robotics.

[50]  Sergey Levine,et al.  Guided Policy Search , 2013, ICML.

[51]  Shane Legg,et al.  Deep Reinforcement Learning from Human Preferences , 2017, NIPS.

[52]  Li Sun,et al.  Accurate garment surface analysis using an active stereo robot head with application to dual-arm flattening , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[53]  Jan Peters,et al.  Non-parametric Policy Search with Limited Information Loss , 2017, J. Mach. Learn. Res..

[54]  Jun Morimoto,et al.  Learning CPG-based Biped Locomotion with a Policy Gradient Method: Application to a Humanoid Robot , 2005, 5th IEEE-RAS International Conference on Humanoid Robots, 2005..

[55]  Carme Torras,et al.  Robot-Aided Cloth Classification Using Depth Information and CNNs , 2016, AMDO.

[56]  Takamitsu Matsubara,et al.  Reinforcement learning of a motor skill for wearing a T-shirt using topology coordinates , 2013, Adv. Robotics.

[57]  Vladimír Petrík,et al.  Folding Clothes Autonomously: A Complete Pipeline , 2016, IEEE Transactions on Robotics.

[58]  Solvi Arnold,et al.  EMD Net: An Encode–Manipulate–Decode Network for Cloth Manipulation , 2018, IEEE Robotics and Automation Letters.

[59]  Jan Peters,et al.  Hierarchical Relative Entropy Policy Search , 2014, AISTATS.

[60]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[61]  Nishanth Koganti,et al.  Bayesian Nonparametric Learning of Cloth Models for Real-Time State Estimation , 2017, IEEE Transactions on Robotics.

[62]  Sergey Levine,et al.  Learning force-based manipulation of deformable objects from multiple demonstrations , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[63]  Tae-Kyun Kim,et al.  Autonomous active recognition and unfolding of clothes using random decision forests and probabilistic planning , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[64]  Nolan Wagener,et al.  Learning contact-rich manipulation skills with guided policy search , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).