Artificial Intelligence for Prosthetics - challenge solutions

In the NeurIPS 2018 Artificial Intelligence for Prosthetics challenge, participants were tasked with building a controller for a musculoskeletal model with a goal of matching a given time-varying velocity vector. Top participants described their algorithms in this paper. Many solutions use similar relaxations and heuristics, such as reward shaping, frame skipping, discretization of the action space, symmetry, and policy blending. However, each team implemented different modifications of the known algorithms by, for example, dividing the task into subtasks, learning low-level control, or by incorporating expert knowledge and using imitation learning.

N. E. Toklu | Wojciech Jaskowski | Jeremy D. Watson | Zeyang Yu | Daniel Kudenko | Sergey Levine | Hong-cheng Zeng | Xu Hu | Odd Rune Lykkebø | Nihat Engin Toklu | Quan Yuan | Marcel Salathé | Rongzhong Lian | Fan Wang | Zehong Hu | Pingchuan Ma | Minghui Qiu | Hao Tian | Zhen Wang | Scott L. Delp | Bo Zhou | Wenxin Li | Rupesh Kumar Srivastava | Shauharda Khadka | Ivan Sosin | Aleksei Shpilman | Oleksii Hrinchuk | Lukasz Kidzinski | Aditya Bhatt | Evren Tumer | Peng Peng | Yunsheng Tian | Yinyin Liu | Ruihan Yang | Sharada Prasanna Mohanty | Somdeb Majumdar | Jun Huang | Sean F. Carroll | Jennifer L. Hicks | Pranav Shyam | Aleksandra Malysheva | Zach Dwiel | Garrett Andersen | Carmichael F. Ong | Sergey Kolesnikov | Penghui Qi | Lance Rane | Anton Pechenko | Mattias Ljungström | Oleg Svidchenko | Zhengfei Wang | S. Levine | D. Kudenko | R. Srivastava | S. Delp | Pranav Shyam | Wojciech Jaśkowski | L. Kidzinski | S. Mohanty | J. Hicks | Sean F. Carroll | M. Salathé | Peng Peng | Quan Yuan | Minghui Qiu | Jun Huang | Zhen Wang | Anton Pechenko | Sergey Kolesnikov | Bo Zhou | I. Sosin | A. Shpilman | E. Tumer | Yunsheng Tian | Aleksandra Malysheva | Hao Tian | Shauharda Khadka | Xu Hu | O. R. Lykkebø | Pingchuan Ma | Zeyang Yu | Somdeb Majumdar | Ruihan Yang | Wenxin Li | Oleksii Hrinchuk | Yinyin Liu | Aditya Bhatt | J. Watson | Rongzhong Lian | Lance Rane | Zach Dwiel | Zhengfei Wang | Hongsheng Zeng | Fan Wang | Oleg Svidchenko | Penghui Qi | Aditya H. Bhatt | Garrett Andersen | Mattias Ljungström | Zhen Wang | Zehong Hu

[1]  Marc G. Bellemare,et al.  A Distributional Perspective on Reinforcement Learning , 2017, ICML.

[2]  Scott L. Delp,et al.  Predictive simulations of human walking produce realistic cost of transport at a range of speeds , 2017 .

[3]  Marc G. Bellemare,et al.  Distributional Reinforcement Learning with Quantile Regression , 2017, AAAI.

[4]  Matthew W. Hoffman,et al.  Distributed Distributional Deterministic Policy Gradients , 2018, ICLR.

[5]  Sergey M. Plis,et al.  Run, skeleton, run: skeletal model in a physics-based simulation , 2018, AAAI Spring Symposia.

[6]  Guy Lever,et al.  Deterministic Policy Gradient Algorithms , 2014, ICML.

[7]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[8]  Sergey Levine,et al.  DeepMimic , 2018, ACM Trans. Graph..

[9]  Soha Pouya,et al.  Simulating Ideal Assistive Devices to Reduce the Metabolic Cost of Running , 2016, PloS one.

[10]  E. M. Wright,et al.  Adaptive Control Processes: A Guided Tour , 1961, The Mathematical Gazette.

[11]  Herke van Hoof,et al.  Addressing Function Approximation Error in Actor-Critic Methods , 2018, ICML.

[12]  Sergey Levine,et al.  Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.

[13]  Sergey Levine,et al.  High-Dimensional Continuous Control Using Generalized Advantage Estimation , 2015, ICLR.

[14]  Thomas Brox,et al.  CrossQ: Batch Normalization in Deep Reinforcement Learning for Greater Sample Efficiency and Simplicity , 2019, 1902.05605.

[15]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[16]  G. Lee,et al.  Reducing the metabolic cost of running with a tethered soft exosuit , 2017, Science Robotics.

[17]  Vitaly Levdik,et al.  Time Limits in Reinforcement Learning , 2017, ICML.

[18]  Seungmoon Song,et al.  A neural circuitry that emphasizes spinal feedback generates diverse behaviours of human locomotion , 2015, The Journal of physiology.

[19]  Sergey Levine,et al.  Trust Region Policy Optimization , 2015, ICML.

[20]  Jan Koutník,et al.  Reinforcement Learning to Run… Fast , 2018 .

[21]  Sepp Hochreiter,et al.  Self-Normalizing Neural Networks , 2017, NIPS.

[22]  Tom Schaul,et al.  Prioritized Experience Replay , 2015, ICLR.

[23]  Scott L Delp,et al.  Generating dynamic simulations of movement using computed muscle control. , 2003, Journal of biomechanics.

[24]  Marcin Andrychowicz,et al.  Parameter Space Noise for Exploration , 2017, ICLR.

[25]  Kilian Q. Weinberger,et al.  Snapshot Ensembles: Train 1, get M for free , 2017, ICLR.

[26]  Marcin Andrychowicz,et al.  Hindsight Experience Replay , 2017, NIPS.

[27]  Shuchang Zhou,et al.  Learning to Run with Actor-Critic Ensemble , 2017, ArXiv.

[28]  Jill S Higginson,et al.  Stabilisation of walking by intrinsic muscle properties revealed in a three-dimensional muscle-driven simulation , 2013, Computer methods in biomechanics and biomedical engineering.

[29]  David Budden,et al.  Distributed Prioritized Experience Replay , 2018, ICLR.

[30]  Michael I. Jordan,et al.  Ray: A Distributed Framework for Emerging AI Applications , 2017, OSDI.

[31]  Shane Legg,et al.  Noisy Networks for Exploration , 2017, ICLR.

[32]  Yuandong Tian,et al.  Training Agent for First-Person Shooter Game with Actor-Critic Curriculum Learning , 2016, ICLR.

[33]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[34]  Sergey Levine,et al.  Learning to Run challenge: Synthesizing physiologically accurate motion using deep reinforcement learning , 2018, ArXiv.

[35]  Ayman Habib,et al.  OpenSim: Simulating musculoskeletal dynamics and neuromuscular control to study human and animal movement , 2018, PLoS Comput. Biol..

[36]  M. de Rijke,et al.  Reinforcement Learning to Rank , 2019, WSDM.

[37]  S. Delp,et al.  Musculoskeletal modelling deconstructs the paradoxical effects of elastic ankle exoskeletons on plantar-flexor mechanics and energetics during hopping , 2014, Journal of Experimental Biology.

[38]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[39]  Rémi Munos,et al.  Recurrent Experience Replay in Distributed Reinforcement Learning , 2018, ICLR.

[40]  Geoffrey J. Gordon,et al.  A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.

[41]  Sergey M. Plis,et al.  Learning to Run challenge solutions: Adapting reinforcement learning methods for neuromusculoskeletal environments , 2018, ArXiv.

[42]  Frank Hutter,et al.  SGDR: Stochastic Gradient Descent with Warm Restarts , 2016, ICLR.

[43]  Thomas G. Dietterich Multiple Classifier Systems , 2000, Lecture Notes in Computer Science.

[44]  Jason Weston,et al.  Curriculum learning , 2009, ICML '09.

[45]  Benjamin Van Roy,et al.  Deep Exploration via Bootstrapped DQN , 2016, NIPS.

[46]  Tamara Sztynda,et al.  Loss of BAP1 Expression Is Very Rare in Pancreatic Ductal Adenocarcinoma , 2016, PloS one.

[47]  Ayman Habib,et al.  OpenSim: Open-Source Software to Create and Analyze Dynamic Simulations of Movement , 2007, IEEE Transactions on Biomedical Engineering.

[48]  R. Crowninshield,et al.  A physiologically based criterion of muscle force prediction in locomotion. , 1981, Journal of biomechanics.

[49]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.