Autonomous Functional Movements in a Tendon-Driven Limb via Limited Experience

Robots will become ubiquitously useful only when they require just a few attempts to teach themselves to perform different tasks, even with complex bodies and in dynamic environments. Vertebrates use sparse trial and error to learn multiple tasks, despite their intricate tendon-driven anatomies, which are particularly hard to control because they are simultaneously nonlinear, under-determined and over-determined. We demonstrate—in simulation and hardware—how a model-free, open-loop approach allows few-shot autonomous learning to produce effective movements in a three-tendon two-joint limb. We use a short period of motor babbling (to create an initial inverse map) followed by building functional habits by reinforcing high-reward behaviour and refinements of the inverse map in a movement’s neighbourhood. This biologically plausible algorithm, which we call G2P (general to particular), can potentially enable quick, robust and versatile adaptation in robots as well as shed light on the foundations of the enviable functional versatility of organisms.To perform complex tasks, robots need to learn the relationship between their bodies and dynamic environments. A biologically plausible approach to hardware and software design shows that a robotic tendon-driven limb can make effective movements based on a short period of learning.

[1]  Yasuhisa Hasegawa,et al.  Unified bipedal gait for autonomous transition between walking and running in pursuit of energy minimization , 2018, Robotics Auton. Syst..

[2]  Yuval Tassa,et al.  Real-time behaviour synthesis for dynamic hand-manipulation , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[3]  Francisco J. Valero Cuevas,et al.  Should Anthropomorphic Systems be "Redundant"? , 2018, Biomechanics of Anthropomorphic Systems.

[4]  Chris Eliasmith,et al.  A neural model of hierarchical reinforcement learning , 2017, CogSci.

[5]  Alexei A. Efros,et al.  Curiosity-Driven Exploration by Self-Supervised Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[6]  Jan Peters,et al.  Hierarchical reinforcement learning of multiple grasping strategies with human instructions , 2018, Adv. Robotics.

[7]  Kerstin Küsters See, feel, act: Hierarchical learning for complex manipulation skills with multisensory fusion , 2019 .

[8]  Stefan Schaal,et al.  Historical Perspective of Humanoid Robot Research in the Americas , 2018, Humanoid Robotics: A Reference.

[9]  D. Wolpert,et al.  Computations underlying sensorimotor learning , 2016, Current Opinion in Neurobiology.

[10]  Gerald E Loeb,et al.  Neuromorphic meets neuromechanics, part II: the role of fusimotor drive , 2017, Journal of neural engineering.

[11]  Kurt A. Thoroughman,et al.  Trial-by-trial transformation of error into sensorimotor adaptation changes with environmental dynamics. , 2007, Journal of neurophysiology.

[12]  Hugo Larochelle,et al.  Optimization as a Model for Few-Shot Learning , 2016, ICLR.

[13]  Jessica C. Selinger,et al.  Humans Can Continuously Optimize Energetic Cost during Walking , 2015, Current Biology.

[14]  Xi Chen,et al.  Evolution Strategies as a Scalable Alternative to Reinforcement Learning , 2017, ArXiv.

[15]  Francisco J. Valero Cuevas,et al.  Model-Free Control of Movement in a Tendon-Driven Limb via a Modified Genetic Algorithm , 2018, 2018 40th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC).

[16]  Cosimo Della Santina,et al.  Using Nonlinear Normal Modes for Execution of Efficient Cyclic Motions in Soft Robots , 2018, ArXiv.

[17]  Jun Nakanishi,et al.  Dynamic motion learning for multi-DOF flexible-joint robots using active–passive motor babbling through deep learning , 2017, Adv. Robotics.

[18]  Jens Kober,et al.  Reinforcement Learning of Potential Fields to achieve Limit-Cycle Walking , 2016 .

[19]  Marco Santello,et al.  Manipulation after object rotation reveals independent sensorimotor memory representations of digit positions and forces. , 2010, Journal of neurophysiology.

[20]  Hod Lipson,et al.  Resilient Machines Through Continuous Self-Modeling , 2006, Science.

[21]  Emanuel Todorov,et al.  Reinforcement learning for non-prehensile manipulation: Transfer from simulation to physical system , 2018, 2018 IEEE International Conference on Simulation, Modeling, and Programming for Autonomous Robots (SIMPAR).

[22]  Sten Grillner,et al.  Biological Pattern Generation: The Cellular and Computational Logic of Networks in Motion , 2006, Neuron.

[23]  Bernhard Schölkopf,et al.  Learning Inverse Dynamics: a Comparison , 2008, ESANN.

[24]  James M. Finley,et al.  Associations Between Foot Placement Asymmetries and Metabolic Cost of Transport in Hemiparetic Gait , 2017, Neurorehabilitation and neural repair.

[25]  Francisco J. Valero-Cuevas,et al.  Fundamentals of Neuromechanics , 2015 .

[26]  Jan Peters,et al.  Goal-driven dimensionality reduction for reinforcement learning , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[27]  Sergey Levine,et al.  Sim-To-Real via Sim-To-Sim: Data-Efficient Robotic Grasping via Randomized-To-Canonical Adaptation Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Sergey Levine,et al.  Trust Region Policy Optimization , 2015, ICML.

[29]  Fumiya Iida,et al.  From Spontaneous Motor Activity to Coordinated Behaviour: A Developmental Model , 2014, PLoS Comput. Biol..

[30]  Prahlad Vadakkepat,et al.  Humanoid Robotics: A Reference , 2018 .

[31]  Florentin Wörgötter,et al.  Adaptive, Fast Walking in a Biped Robot under Neuronal Control and Learning , 2007, PLoS Comput. Biol..

[32]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[33]  Evangelos Theodorou,et al.  Tendon-driven control of biomechanical and robotic systems: A path integral reinforcement learning approach , 2012, 2012 IEEE International Conference on Robotics and Automation.

[34]  Osamu Hasegawa,et al.  Self-Organizing Incremental Neural Network (SOINN) as a Mechanism for Motor Babbling and Sensory-Motor Learning in Developmental Robotics , 2013, IWANN.

[35]  Hiroaki Kobayashi,et al.  Adaptive neural network control of tendon-driven mechanisms with elastic tendons , 2003, Autom..

[36]  Giulio Sandini,et al.  The iCub humanoid robot: An open-systems platform for research in cognitive development , 2010, Neural Networks.

[37]  Tom Schaul,et al.  StarCraft II: A New Challenge for Reinforcement Learning , 2017, ArXiv.

[38]  J. Knott The organization of behavior: A neuropsychological theory , 1951 .

[39]  Giorgio Metta,et al.  Real-time model learning using Incremental Sparse Spectrum Gaussian Process Regression. , 2013, Neural networks : the official journal of the International Neural Network Society.

[40]  E. Todorov Optimality principles in sensorimotor control , 2004, Nature Neuroscience.

[41]  Sergey Levine,et al.  Learning Dexterous Manipulation Policies from Experience and Imitation , 2016, ArXiv.

[42]  Andrea Bonarini,et al.  Incremental Skill Acquisition for Self-motivated Learning Animats , 2006, SAB.

[43]  Brijen Thananjeyan,et al.  SWIRL: A sequential windowed inverse reinforcement learning algorithm for robot tasks with delayed rewards , 2018, Int. J. Robotics Res..

[44]  J. Dingwell,et al.  Dynamic stability of human walking in visually and mechanically destabilizing environments. , 2011, Journal of biomechanics.

[45]  Qi Luo,et al.  Design of a Biomimetic Control System for Tendon-driven Prosthetic Hand , 2018, 2018 IEEE International Conference on Cyborg and Bionic Systems (CBS).

[46]  F. Attneave,et al.  The Organization of Behavior: A Neuropsychological Theory , 1949 .

[47]  D. Westwood,et al.  Movement related sensory feedback is not necessary for learning to execute a motor skill , 2019, Behavioural Brain Research.

[48]  Stefan Schaal,et al.  Learning inverse kinematics , 2001, Proceedings 2001 IEEE/RSJ International Conference on Intelligent Robots and Systems. Expanding the Societal Role of Robotics in the the Next Millennium (Cat. No.01CH37180).

[49]  Gerald E. Loeb,et al.  Optimal isn’t good enough , 2012, Biological Cybernetics.

[50]  Stefan Schaal,et al.  Automatic LQR tuning based on Gaussian process global optimization , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[51]  Jakub W. Pachocki,et al.  Learning dexterous in-hand manipulation , 2018, Int. J. Robotics Res..

[52]  Gregory S. Sawicki,et al.  Reducing the energy cost of human walking using an unpowered exoskeleton , 2015, Nature.

[53]  Oliver Brock,et al.  Interactive Perception: Leveraging Action in Perception and Perception in Action , 2016, IEEE Transactions on Robotics.

[54]  Michiel van de Panne,et al.  Flexible muscle-based locomotion for bipedal creatures , 2013, ACM Trans. Graph..

[55]  Jun Nakanishi,et al.  Learning Attractor Landscapes for Learning Motor Primitives , 2002, NIPS.

[56]  Roger D. Quinn,et al.  Development and Training of a Neural Controller for Hind Leg Walking in a Dog Robot , 2017, Front. Neurorobot..

[57]  Veljko Potkonjak,et al.  The Puller-Follower Control of Compliant and Noncompliant Antagonistic Tendon Drives in Robotic Systems , 2011 .

[58]  Jesse M. Lingeman,et al.  How Do You Learn to Walk? Thousands of Steps and Dozens of Falls per Day , 2012, Psychological science.

[59]  Sergey Levine,et al.  Learning Complex Dexterous Manipulation with Deep Reinforcement Learning and Demonstrations , 2017, Robotics: Science and Systems.