Zermelo's problem: Optimal point-to-point navigation in 2D turbulent flows using Reinforcement Learning

To find the path that minimizes the time to navigate between two given points in a fluid flow is known as Zermelo's problem. Here, we investigate it by using a Reinforcement Learning (RL) approach for the case of a vessel that has a slip velocity with fixed intensity, Vs, but variable direction and navigating in a 2D turbulent sea. We show that an Actor-Critic RL algorithm is able to find quasioptimal solutions for both time-independent and chaotically evolving flow configurations. For the frozen case, we also compared the results with strategies obtained analytically from continuous Optimal Navigation (ON) protocols. We show that for our application, ON solutions are unstable for the typical duration of the navigation process and are, therefore, not useful in practice. On the other hand, RL solutions are much more robust with respect to small changes in the initial conditions and to external noise, even when Vs is much smaller than the maximum flow velocity. Furthermore, we show how the RL approach is able to take advantage of the flow properties in order to reach the target, especially when the steering speed is small.

[1]  Jinwhan Kim,et al.  Path optimization for marine vehicles in ocean currents using reinforcement learning , 2016 .

[2]  George M. Siouris,et al.  Applied Optimal Control: Optimization, Estimation, and Control , 1979, IEEE Transactions on Systems, Man, and Cybernetics.

[3]  Gaurav S. Sukhatme,et al.  Predicting Wave Glider speed from environmental measurements , 2011, OCEANS'11 MTS/IEEE KONA.

[4]  Uriel Frisch,et al.  Chaotic streamlines in the ABC flows , 1986, Journal of Fluid Mechanics.

[5]  A. Mazzino,et al.  Unraveling turbulence via physics-informed data-assimilation and spectral nudging , 2018 .

[6]  V. Holubec,et al.  Reinforcement learning with artificial microswimmers , 2018, Science Robotics.

[7]  F Toschi,et al.  Heavy particle concentration in turbulence at dissipative and inertial scales. , 2006, Physical review letters.

[8]  J. Arlt,et al.  Probing the Spatiotemporal Dynamics of Catalytic Janus Particles with Single-Particle Tracking and Differential Dynamic Microscopy. , 2017, Physical review letters.

[9]  Christof Jung,et al.  Tracer dynamics in open hydrodynamical flows as chaotic scattering , 1994 .

[10]  Ralf Eichhorn,et al.  Measurement of anomalous diffusion using recurrent neural networks. , 2019, Physical review. E.

[11]  S. Dietrich,et al.  Pulling and pushing a cargo with a catalytically active carrier , 2011, 1106.0066.

[12]  Benno Liebchen,et al.  Optimal Control Strategies for Active Particle Navigation , 2019 .

[13]  Nicholas David Kraus,et al.  Wave glider dynamic modeling, parameter identification and simulation , 2012 .

[14]  Edward Ott,et al.  Attractor reconstruction by machine learning. , 2018, Chaos.

[15]  A. Ōkubo Horizontal dispersion of floatable particles in the vicinity of velocity singularities such as convergences , 1970 .

[16]  G. Volpe,et al.  Active Particles in Complex and Crowded Environments , 2016, 1602.00081.

[17]  A. Mohan,et al.  Compressed Convolutional LSTM: An Efficient Deep Learning framework to Model High Fidelity 3D Turbulence , 2019, 1903.00033.

[18]  Antonio Celani,et al.  Flow Navigation by Smart Microswimmers via Reinforcement Learning , 2017, Physical review letters.

[19]  Eberhard Bodenschatz,et al.  Where do small, weakly inertial particles go in a turbulent flow? , 2010, Journal of Fluid Mechanics.

[20]  A. Vulpiani,et al.  Transport in finite size systems: An exit time approach. , 1999, Chaos.

[21]  G. Falkovich,et al.  Upscale energy transfer in thick turbulent fluid layers , 2011 .

[22]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[23]  G. Evensen,et al.  Data assimilation in the geosciences: An overview of methods, issues, and perspectives , 2017, WIREs Climate Change.

[24]  Laszlo Techy,et al.  Optimal navigation in planar time-varying flow: Zermelo’s problem revisited , 2011, Intell. Serv. Robotics.

[25]  S. Lakshmivarahan,et al.  Nudging Methods: A Critical Overview , 2013 .

[26]  S. Riser,et al.  The Argo Program : observing the global ocean with profiling floats , 2009 .

[27]  C. S. Kulkarni,et al.  A future for intelligent autonomous ocean observing systems , 2017 .

[28]  Antonio Celani,et al.  Finding efficient swimming strategies in a three-dimensional chaotic flow by reinforcement learning , 2017, The European Physical Journal E.

[29]  Celso Grebogi,et al.  Bifurcation to chaotic scattering , 1990 .

[30]  F. Toschi,et al.  Lagrangian Properties of Particles in Turbulence , 2009 .

[31]  Gautam Reddy,et al.  Learning to soar in turbulent environments , 2016, Proceedings of the National Academy of Sciences.

[32]  Petros Koumoutsakos,et al.  Learning to school in the presence of hydrodynamic interactions , 2015, Journal of Fluid Mechanics.

[33]  Stanislav Boldyrev,et al.  Two-dimensional turbulence , 1980 .

[34]  Marc Peter Deisenroth,et al.  Deep Reinforcement Learning: A Brief Survey , 2017, IEEE Signal Processing Magazine.

[35]  Yan Pailhas,et al.  Path Planning for Autonomous Underwater Vehicles , 2007, IEEE Transactions on Robotics.

[36]  Petros Koumoutsakos,et al.  Efficient collective swimming by harnessing vortices through deep reinforcement learning , 2018, Proceedings of the National Academy of Sciences.

[37]  Petros Koumoutsakos,et al.  Data-driven forecasting of high-dimensional chaotic systems with long short-term memory networks , 2018, Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[38]  Joseph Z. Ben-Asher,et al.  Optimal control theory with aerospace applications , 2010 .

[39]  Matthew Dunbabin,et al.  Go with the flow : optimal AUV path planning in coastal environments , 2009, ICRA 2009.

[40]  Petros Koumoutsakos,et al.  Deep-Reinforcement-Learning for Gliding and Perching Bodies , 2018, ArXiv.

[41]  L. Biferale,et al.  Synchronization to Big Data: Nudging the Navier-Stokes Equations for Data Assimilation of Turbulent Flows , 2019, Physical Review X.

[42]  Sabrina Fossette,et al.  Route optimisation and solving Zermelo's navigation problem during long distance migration in cross flows. , 2014, Ecology letters.

[43]  E. Zermelo Über das Navigationsproblem bei ruhender oder veränderlicher Windverteilung , 1931 .

[44]  Terrence J. Sejnowski,et al.  Glider soaring via reinforcement learning in the field , 2018, Nature.

[45]  Paolo Oddo,et al.  VISIR-I: small vessels – least-time nautical routes using wave forecasts , 2015 .

[46]  Samuel Sanchez,et al.  Transport of cargo by catalytic Janus micro-motors , 2012 .

[47]  P. Niiler Chapter 4.1 The world ocean surface circulation , 2001 .

[48]  Jaideep Pathak,et al.  Using machine learning to replicate chaotic attractors and calculate Lyapunov exponents from data. , 2017, Chaos.

[49]  R. Lumpkin,et al.  Lagrangian Analysis and Prediction of Coastal and Ocean Dynamics: Measuring surface currents with Surface Velocity Program drifters: the instrument, its data, and some recent results , 2007 .

[50]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[51]  L. Centurioni Drifter Technology and Impacts for Sea Surface Temperature, Sea-Level Pressure, and Ocean Circulation Studies , 2018 .

[52]  E. Ott Chaos in Dynamical Systems: Contents , 2002 .

[53]  Andrew Y. Ng,et al.  Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.

[54]  Luca Biferale,et al.  Smart inertial particles , 2017, Physical Review Fluids.

[55]  R. Lumpkin,et al.  Measuring surface currents with Surface Velocity Program drifters : the instrument , its data , and some recent results , 2022 .

[56]  Luca Biferale,et al.  Cascades and transitions in turbulent flows , 2018, Physics Reports.

[57]  Ali Farhadi,et al.  Target-driven visual navigation in indoor scenes using deep reinforcement learning , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[58]  O. Michel,et al.  Measurement of Lagrangian velocity in fully developed turbulence. , 2001, Physical review letters.

[59]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[60]  Petros Koumoutsakos,et al.  Machine Learning for Fluid Mechanics , 2019, Annual Review of Fluid Mechanics.

[61]  J. Weiss The dynamics of entropy transfer in two-dimensional hydrodynamics , 1991 .

[62]  Roy M. Howard,et al.  Linear System Theory , 1992 .

[63]  Michael Chertkov,et al.  From Deep to Physics-Informed Learning of Turbulence: Diagnostics , 2018, ArXiv.

[64]  On Shun Pak,et al.  Self-learning how to swim at low Reynolds number , 2018, 1808.07639.