Glider soaring via reinforcement learning in the field

Soaring birds often rely on ascending thermal plumes (thermals) in the atmosphere as they search for prey or migrate across large distances1–4. The landscape of convective currents is rugged and shifts on timescales of a few minutes as thermals constantly form, disintegrate or are transported away by the wind5,6. How soaring birds find and navigate thermals within this complex landscape is unknown. Reinforcement learning7 provides an appropriate framework in which to identify an effective navigational strategy as a sequence of decisions made in response to environmental cues. Here we use reinforcement learning to train a glider in the field to navigate atmospheric thermals autonomously. We equipped a glider of two-metre wingspan with a flight controller that precisely controlled the bank angle and pitch, modulating these at intervals with the aim of gaining as much lift as possible. A navigational strategy was determined solely from the glider’s pooled experiences, collected over several days in the field. The strategy relies on on-board methods to accurately estimate the local vertical wind accelerations and the roll-wise torques on the glider, which serve as navigational cues. We establish the validity of our learned flight policy through field experiments, numerical simulations and estimates of the noise in measurements caused by atmospheric turbulence. Our results highlight the role of vertical wind accelerations and roll-wise torques as effective mechanosensory cues for soaring birds and provide a navigational strategy that is directly applicable to the development of autonomous soaring vehicles.A reinforcement learning approach allows a suitably equipped glider to navigate thermal plumes autonomously in an open field.

[1]  J. Lumley,et al.  A First Course in Turbulence , 1972 .

[2]  D. Lenschow,et al.  The role of thermals in the convective boundary layer , 1980 .

[3]  C. Pennycuick Thermal Soaring Compared in Three Dissimilar Tropical Bird Species, Fregata Magnificens, Pelecanus Occidentals and Coragyps Atratus , 1983 .

[4]  S. Pope,et al.  Lagrangian statistics from direct numerical simulations of isotropic turbulence , 1989, Journal of Fluid Mechanics.

[5]  J. Garratt The Atmospheric Boundary Layer , 1992 .

[6]  Gerald Tesauro,et al.  Temporal difference learning and TD-Gammon , 1995, CACM.

[7]  U. Frisch Turbulence: The Legacy of A. N. Kolmogorov , 1996 .

[8]  John H. Cochrane,et al.  MacCREADY THEORY WITH UNCERTAIN LIFT AND LIMITED ALTITUDE , 1999 .

[9]  Andrew Y. Ng,et al.  Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.

[10]  G. Voth,et al.  Measurement of particle accelerations in fully developed turbulence , 2001, Journal of Fluid Mechanics.

[11]  J. Shamoun‐Baranes,et al.  DIFFERENTIAL USE OF THERMAL CONVECTION BY SOARING BIRDS OVER CENTRAL ISRAEL , 2003 .

[12]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[13]  Michael J. Allen Guidance and Control of an Autonomous Soaring Vehicle with Flight Test Results , 2007 .

[14]  Jean-Arcady Meyer,et al.  Soaring behaviors in UAVs : 'animat' design methodology and current results , 2007 .

[15]  Daniel J. Edwards Implementation Details and Flight Test Results of an Autonomous Soaring Controller , 2008 .

[16]  Tamás Vicsek,et al.  Thermal soaring flight of birds and unmanned aerial vehicles , 2010, Bioinspiration & biomimetics.

[17]  Daniel J. Edwards,et al.  Autonomous Soaring: The Montague Cross-Country Challenge , 2010 .

[18]  Ran Nathan,et al.  The gliding speed of migrating birds: slow and safe or fast and risky? , 2014, Ecology letters.

[19]  Nicholas R. J. Lawrance,et al.  Learning to soar: Resource-constrained exploration in reinforcement learning , 2015, Int. J. Robotics Res..

[20]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[21]  François Laviolette,et al.  Domain-Adversarial Training of Neural Networks , 2015, J. Mach. Learn. Res..

[22]  H. Weimerskirch,et al.  Frigate birds track atmospheric conditions over months-long transoceanic flights , 2016, Science.

[23]  Gautam Reddy,et al.  Learning to soar in turbulent environments , 2016, Proceedings of the National Academy of Sciences.

[24]  Sergey Levine,et al.  End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..

[25]  Demis Hassabis,et al.  Mastering the game of Go without human knowledge , 2017, Nature.