Reinforcement Learning Control of Robotic Knee With Human-in-the-Loop by Flexible Policy Iteration

This study is motivated by a new class of challenging control problems described by automatic tuning of robotic knee control parameters with human in the loop. In addition to inter-person and intra-person variances inherent in such human-robot systems, human user safety and stability, as well as data and time efficiency should also be taken into design consideration. Here by data and time efficiency we mean learning and adaptation of device configurations takes place within countable gait cycles or within minutes of time. As solutions to this problem is not readily available, we therefore propose a new policy iteration based adaptive dynamic programming algorithm, namely the flexible policy iteration (FPI). We show that the FPI solves the control parameters via (weighted) least-squares while it incorporates data flexibly and utilizes prior knowledge. We provide analyses on stable control policies, non-increasing and converging value functions to Bellman optimality, and error bounds on the iterative value functions subject to approximation errors. We extensively evaluated the performance of FPI in a well-established locomotion simulator, the OpenSim under realistic conditions. By inspecting FPI with three other comparable algorithms, we demonstrate the FPI as a feasible data and time efficient design approach for adapting the control parameters of the prosthetic knee to co-adapt with the human user who also places control on the prosthesis. As the proposed FPI algorithm does not require stringent constraints or peculiar assumptions, we expect this reinforcement learning controller can potentially be applied to other challenging adaptive optimal control problems.

[1]  Frank L. Lewis,et al.  Reinforcement Learning and Approximate Dynamic Programming for Feedback Control , 2012 .

[2]  Jennie Si,et al.  Online Reinforcement Learning Control for the Personalization of a Robotic Knee Prosthesis , 2020, IEEE Transactions on Cybernetics.

[3]  Derong Liu,et al.  A novel policy iteration based deterministic Q-learning for discrete-time nonlinear systems , 2015, Science China Information Sciences.

[4]  Shengwei Mei,et al.  Policy Approximation in Policy Iteration Approximate Dynamic Programming for Discrete-Time Nonlinear Systems , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[5]  Zhong-Ping Jiang,et al.  Global Adaptive Dynamic Programming for Continuous-Time Nonlinear Systems , 2013, IEEE Transactions on Automatic Control.

[6]  Sergey Levine,et al.  End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..

[7]  Fan Zhang,et al.  Improving Finite State Impedance Control of Active-Transfemoral Prosthesis Using Dempster-Shafer Based State Transition Rules , 2014, J. Intell. Robotic Syst..

[8]  Yujing Hu,et al.  Learning in Multi-agent Systems with Sparse Interactions by Knowledge Transfer and Game Abstraction , 2015, AAMAS.

[9]  J. Maxwell Donelan,et al.  "Body-In-The-Loop": Optimizing Device Parameters Using Measures of Instantaneous Energetic Cost , 2015, PloS one.

[10]  Ming Liu,et al.  Interactions Between Transfemoral Amputees and a Powered Knee Prosthesis During Load Carriage , 2017, Scientific Reports.

[11]  David Silver,et al.  Combining online and offline knowledge in UCT , 2007, ICML '07.

[12]  Martin A. Riedmiller,et al.  Reinforcement learning in feedback control , 2011, Machine Learning.

[13]  Haibo He,et al.  Adaptive Critic Learning and Experience Replay for Decentralized Event-Triggered Control of Nonlinear Interconnected Systems , 2020, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[14]  Jennie Si,et al.  Apache Helicopter Stabilization Using Neural Dynamic Programming , 2002 .

[15]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[16]  Sam Devlin,et al.  Reward shaping for knowledge-based multi-objective multi-agent reinforcement learning , 2018, The Knowledge Engineering Review.

[17]  H.A. Varol,et al.  Preliminary Evaluations of a Self-Contained Anthropomorphic Transfemoral Prosthesis , 2009, IEEE/ASME Transactions on Mechatronics.

[18]  Chao Lu,et al.  Direct Heuristic Dynamic Programming for Damping Oscillations in a Large Power System , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[19]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[20]  R. Enns,et al.  Helicopter flight control design using a learning control approach , 2000, Proceedings of the 39th IEEE Conference on Decision and Control (Cat. No.00CH37187).

[21]  Jennie Si,et al.  Comparing parallel and sequential control parameter tuning for a powered knee prosthesis , 2017, 2017 IEEE International Conference on Systems, Man, and Cybernetics (SMC).

[22]  Pasi Liljeberg,et al.  Energy-Efficient Virtual Machines Consolidation in Cloud Data Centers Using Reinforcement Learning , 2014, 2014 22nd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing.

[23]  Derong Liu,et al.  Discrete-Time Optimal Control via Local Policy Iteration Adaptive Dynamic Programming , 2017, IEEE Transactions on Cybernetics.

[24]  Tingwen Huang,et al.  Model-Free Optimal Tracking Control via Critic-Only Q-Learning , 2016, IEEE Transactions on Neural Networks and Learning Systems.

[25]  Marcin Andrychowicz,et al.  Hindsight Experience Replay , 2017, NIPS.

[26]  Matthieu Geist,et al.  Approximate modified policy iteration and its application to the game of Tetris , 2015, J. Mach. Learn. Res..

[27]  Derong Liu,et al.  Generalized Policy Iteration Adaptive Dynamic Programming for Discrete-Time Nonlinear Systems , 2015, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[28]  Jennie Si,et al.  Adaptive control of powered transfemoral prostheses based on adaptive dynamic programming , 2016, 2016 38th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC).

[29]  Feng Liu,et al.  A boundedness result for the direct heuristic dynamic programming , 2012, Neural Networks.

[30]  Feng Liu,et al.  Approximate dynamic programming based supplementary reactive power control for DFIG wind farm to enhance power system stability , 2015, Neurocomputing.

[31]  David Budden,et al.  Distributed Prioritized Experience Replay , 2018, ICLR.

[32]  Frank L. Lewis,et al.  Integral reinforcement learning and experience replay for adaptive optimal control of partially-unknown constrained-input continuous-time systems , 2014, Autom..

[33]  Derong Liu,et al.  Policy Iteration Adaptive Dynamic Programming Algorithm for Discrete-Time Nonlinear Systems , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[34]  Hugh Herr,et al.  Agonist-antagonist active knee prosthesis: a preliminary study in level-ground walking. , 2009, Journal of rehabilitation research and development.

[35]  Jennie Si,et al.  Helicopter trimming and tracking control using direct neural dynamic programming , 2003, IEEE Trans. Neural Networks.

[36]  Qichao Zhang,et al.  Experience Replay for Optimal Control of Nonzero-Sum Game Systems With Unknown Dynamics , 2016, IEEE Transactions on Cybernetics.

[37]  K. R. Dixon,et al.  Incorporating Prior Knowledge and Previously Learned Information into Reinforcement Learning Agents , 2000 .

[38]  Michail G. Lagoudakis,et al.  Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..

[39]  Jennie Si,et al.  Online learning control by association and reinforcement. , 2001, IEEE transactions on neural networks.

[40]  Elliott J. Rouse,et al.  Estimation of Human Ankle Impedance During the Stance Phase of Walking , 2014, IEEE Transactions on Neural Systems and Rehabilitation Engineering.

[41]  R. Simmons,et al.  The effect of representation and knowledge on goal-directed exploration with reinforcement-learning algorithms , 2004, Machine Learning.

[42]  Feng Liu,et al.  Online Supplementary ADP Learning Controller Design and Application to Power System Frequency Control With Large-Scale Wind Energy Integration , 2016, IEEE Transactions on Neural Networks and Learning Systems.

[43]  Jennie Si,et al.  A New Powered Lower Limb Prosthesis Control Framework Based on Adaptive Dynamic Programming , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[44]  He Huang,et al.  A Cyber Expert System for Auto-Tuning Powered Prosthesis Impedance Control Parameters , 2015, Annals of Biomedical Engineering.

[45]  Hartmut Geyer,et al.  Control of a Powered Ankle–Foot Prosthesis Based on a Neuromuscular Model , 2010, IEEE Transactions on Neural Systems and Rehabilitation Engineering.

[46]  Derong Liu,et al.  Adaptive $Q$ -Learning for Data-Based Optimal Output Regulation With Experience Replay , 2018, IEEE Transactions on Cybernetics.

[47]  Tom Schaul,et al.  Prioritized Experience Replay , 2015, ICLR.

[48]  Demis Hassabis,et al.  Mastering the game of Go without human knowledge , 2017, Nature.

[49]  Long Ji Lin,et al.  Self-improving reactive agents based on reinforcement learning, planning and teaching , 1992, Machine Learning.

[50]  Warren B. Powell,et al.  Handbook of Learning and Approximate Dynamic Programming , 2006, IEEE Transactions on Automatic Control.

[51]  David Isele,et al.  Selective Experience Replay for Lifelong Learning , 2018, AAAI.

[52]  Robert Babuska,et al.  Experience Replay for Real-Time Reinforcement Learning Control , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).