Reinforcement learning for stabilizing an inverted pendulum naturally leads to intermittent feedback control as in human quiet standing

Intermittent feedback control for stabilizing human upright stance is a promising strategy, alternative to the standard time-continuous stiffness control. Here we show that such an intermittent controller can be established naturally through reinforcement learning. To this end, we used a single inverted pendulum model of the upright posture and a very simple reward function that gives a certain amount of punishments when the inverted pendulum falls or changes its position in the state space. We found that the acquired feedback controller exhibits hallmarks of the intermittent feedback control strategy, namely the action of the feedback controller is switched-off intermittently when the state of the pendulum is located near the stable manifold of the unstable saddle-type upright equilibrium of the inverted pendulum with no active control: this action provides an opportunity to exploit transiently converging dynamics toward the unstable upright position with no help of the active feedback control. We then speculate about a possible physiological mechanism of such reinforcement learning, and suggest that it may be related to the neural activity in the pedunculopontine tegmental nucleus (PPN) of the brainstem. This hypothesis is supported by recent evidence indicating that PPN might play critical roles for generation and regulation of postural tonus, reward prediction, as well as postural instability in patients with Parkinson's disease.

[1]  D. Winter,et al.  Stiffness control of balance in quiet standing. , 1998, Journal of neurophysiology.

[2]  Yasuyuki Suzuki,et al.  Intermittent control with ankle, hip, and mixed strategies during quiet standing: a theoretical proposal based on a double inverted pendulum model. , 2012, Journal of theoretical biology.

[3]  K. Saitoh,et al.  Basal ganglia efferents to the brainstem centers controlling postural muscle tone and locomotion: a new concept for understanding motor disorders in basal ganglia dysfunction , 2003, Neuroscience.

[4]  Yoshiyuki Asai,et al.  A Model of Postural Control in Quiet Standing: Robust Compensation of Delay-Induced Instability Using Intermittent Activation of Feedback Control , 2009, PloS one.

[5]  Yasushi Kobayashi,et al.  Different Pedunculopontine Tegmental Neurons Signal Predicted and Actual Task Rewards , 2009, The Journal of Neuroscience.

[6]  Taishin Nomura,et al.  A Classification of Postural Sway Patterns During Upright Stance in Healthy Adults and Patients with Parkinson's Disease , 2011, J. Adv. Comput. Intell. Intell. Informatics.

[7]  Yoshiyuki Asai,et al.  Learning an Intermittent Control Strategy for Postural Balancing Using an EMG-Based Human-Computer Interface , 2013, PloS one.

[8]  Taishin Nomura,et al.  Bounded stability of the quiet standing posture: an intermittent control model. , 2008, Human movement science.

[9]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[10]  P. Morasso,et al.  Direct measurement of ankle stiffness during quiet standing: implications for control modelling and clinical application. , 2005, Gait & posture.

[11]  Henrik Gollee,et al.  Human control of an inverted pendulum: Is continuous control necessary? Is intermittent control effective? Is intermittent control physiological? , 2011, The Journal of physiology.

[12]  John Milton,et al.  Sensory uncertainty and stick balancing at the fingertip , 2014, Biological Cybernetics.