Integrating Multiple Policies for Person-Following Robot Training Using Deep Reinforcement Learning

Given a training environment which follows Markov decision process for a specific task, a deep reinforcement learning (DRL) agent is able to find possible optimal policies which map states of the environment to appropriate actions by repeatedly trying various actions to maximize training rewards. However, the learned policies cannot be reused directly in the training process for other new tasks resulting wasted precious time and resources. To solve this problem, we propose a DRL-based method for training an agent capable of selecting the appropriate policy for current state of the environment from a set of previously trained optimal policies for a given task which can be decomposed into other sub tasks. We implement our proposed method to a person-following robot task training that can be broken down into three sub tasks, namely: navigation, left attending, and right attending. Using the proposed method, the previously learned optimal navigation policy obtained from our previous work is integrated to attending policies which are trained in this study. We also introduce the use of weight-scheduled action smoothing which is able to stabilize actions generated by the agent in the attending task training. Our experiment results show that the proposed method is able to integrate all sub policies using the action smoothing method even though the navigation and the attending policies have dissimilar input structures, unalike output ranges, and are trained in different ways. Moreover, our proposed method shows better results compared to training from scratch and training using transfer learning strategy.

[1]  Hui Liu,et al.  A new hybrid ensemble deep reinforcement learning model for wind speed short term forecasting , 2020 .

[2]  Jun Miura,et al.  Generating Adaptive Attending Behaviors using User State Classification and Deep Reinforcement Learning , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[3]  Yee Whye Teh,et al.  Distral: Robust multitask reinforcement learning , 2017, NIPS.

[4]  Qusay H. Mahmoud,et al.  A Survey of Multi-Task Deep Reinforcement Learning , 2020, Electronics.

[5]  Wojciech Zaremba,et al.  OpenAI Gym , 2016, ArXiv.

[6]  Morgan Quigley,et al.  ROS: an open-source Robot Operating System , 2009, ICRA 2009.

[7]  Yunzhou Zhang,et al.  Efficient Hybrid-Supervised Deep Reinforcement Learning for Person Following Robot , 2019, Journal of Intelligent & Robotic Systems.

[8]  Hriday Bavle,et al.  Laser-Based Reactive Navigation for Multirotor Aerial Robots using Deep Reinforcement Learning , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[9]  Marc Peter Deisenroth,et al.  Deep Reinforcement Learning: A Brief Survey , 2017, IEEE Signal Processing Magazine.

[10]  Asanka Wasala,et al.  Trajectory based lateral control: A Reinforcement Learning case study , 2020, Eng. Appl. Artif. Intell..

[11]  Md Jahidul Islam,et al.  Person-following by autonomous robots: A categorical overview , 2018, Int. J. Robotics Res..

[12]  Jason Weston,et al.  Curriculum learning , 2009, ICML '09.

[13]  Hado van Hasselt,et al.  Double Q-learning , 2010, NIPS.

[14]  David Silver,et al.  Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.

[15]  Bernhard Hengst,et al.  Hierarchical Approaches , 2012, Reinforcement Learning.

[16]  Junji Satake,et al.  Development of a person following robot and its experimental evaluation , 2010 .

[17]  Jun Miura,et al.  A Framework for DRL Navigation With State Transition Checking and Velocity Increment Scheduling , 2020, IEEE Access.

[18]  Jun Miura,et al.  Toward a robotic attendant adaptively behaving according to human state , 2016, 2016 25th IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN).

[19]  Jan Peters,et al.  Reinforcement learning in robotics: A survey , 2013, Int. J. Robotics Res..

[20]  Andrew Howard,et al.  Design and use paradigms for Gazebo, an open-source multi-robot simulator , 2004, 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No.04CH37566).

[21]  Hussein A. Abbass,et al.  Multi-Task Deep Reinforcement Learning for Continuous Action Control , 2017, IJCAI.

[22]  Nilanjan Dey,et al.  Adam Deep Learning With SOM for Human Sentiment Classification , 2019, Int. J. Ambient Comput. Intell..

[23]  Wojciech Czarnecki,et al.  Multi-task Deep Reinforcement Learning with PopArt , 2018, AAAI.

[24]  Pieter Abbeel,et al.  Meta Learning Shared Hierarchies , 2017, ICLR.

[25]  Doina Precup,et al.  The Option-Critic Architecture , 2016, AAAI.

[26]  Emanuele Menegatti,et al.  Monocular person tracking and identification with on-line deep feature selection for person following robots , 2020, Robotics Auton. Syst..

[27]  Daniel King,et al.  Fetch & Freight : Standard Platforms for Service Robot Applications , 2016 .

[28]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[29]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[30]  M. Botvinick,et al.  Hierarchically organized behavior and its neural foundations: A reinforcement learning perspective , 2009, Cognition.

[31]  Oussama Khatib,et al.  Real-Time Obstacle Avoidance for Manipulators and Mobile Robots , 1985, Autonomous Robot Vehicles.

[32]  Jun Miura,et al.  Training a Robot to Attend a Person at Specific Locations using Soft Actor-Critic under Simulated Environment , 2021, 2021 IEEE/SICE International Symposium on System Integration (SII).

[33]  Renaud Dubé,et al.  Robot Navigation in Crowded Environments Using Deep Reinforcement Learning , 2020, 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[34]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[35]  Joelle Pineau,et al.  OptionGAN: Learning Joint Reward-Policy Options using Generative Adversarial Inverse Reinforcement Learning , 2017, AAAI.

[36]  Sridhar Mahadevan,et al.  Recent Advances in Hierarchical Reinforcement Learning , 2003, Discret. Event Dyn. Syst..

[37]  Sergey Levine,et al.  Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.

[38]  Saurabh Kumar,et al.  Learning to Compose Skills , 2017, ArXiv.

[39]  Shane Legg,et al.  IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures , 2018, ICML.

[40]  Hao Dong,et al.  Challenges of Reinforcement Learning , 2020 .

[41]  N. Hendrich,et al.  Learning Local Planners for Human-aware Navigation in Indoor Environments , 2020, 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[42]  Yael Edan,et al.  Toward Socially Aware Person-Following Robots , 2018, IEEE Transactions on Cognitive and Developmental Systems.

[43]  Diego Reforgiato Recupero,et al.  Multi-DQN: An ensemble of Deep Q-learning agents for stock market forecasting , 2021, Expert Syst. Appl..

[44]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[45]  Bima Sena Bayu Dewantara,et al.  Generation of a socially aware behavior of a guide robot using reinforcement learning , 2016, 2016 International Electronics Symposium (IES).

[46]  M. Botvinick Hierarchical reinforcement learning and decision making , 2012, Current Opinion in Neurobiology.

[47]  Ming Liu,et al.  High-Speed Autonomous Drifting With Deep Reinforcement Learning , 2020, IEEE Robotics and Automation Letters.

[48]  K. C. Santosh,et al.  Gradient boosting in crowd ensembles for Q-learning using weight sharing , 2020, Int. J. Mach. Learn. Cybern..

[49]  Murray Shanahan,et al.  Classifying Options for Deep Reinforcement Learning , 2016, ArXiv.

[50]  Michael L. Littman,et al.  An Ensemble of Linearly Combined Reinforcement-Learning Agents , 2013, AAAI.