Autonomous closed-loop guidance using reinforcement learning in a low-thrust, multi-body dynamical environment

Abstract Onboard autonomy is an essential component in enabling increasingly complex missions into deep space. In nonlinear dynamical environments, computationally efficient guidance strategies are challenging. Many traditional approaches rely on either simplifying assumptions in the dynamical model or on abundant computational resources. This research effort employs reinforcement learning, a subset of machine learning, to produce a ‘lightweight’ closed-loop controller that is potentially suitable for onboard low-thrust guidance in challenging dynamical regions of space. The results demonstrate the controller’s ability to directly guide a spacecraft despite large initial deviations and to augment a traditional targeting guidance approach. The proposed controller functions without direct knowledge of the dynamical model; direct interaction with the nonlinear equations of motion creates a flexible learning scheme that is not limited to a single force model, mission scenario, or spacecraft. The learning process leverages high-performance computing to train a closed-loop neural network controller. This controller may be employed onboard to autonomously generate low-thrust control profiles in real-time without imposing a heavy workload on a flight computer. Control feasibility is demonstrated through sample transfers between Lyapunov orbits in the Earth-Moon system. The sample low-thrust controller exhibits remarkable robustness to perturbations and generalizes effectively to nearby motion. Finally, the flexibility of the learning framework is demonstrated across a range of mission scenarios and low-thrust engine types.

[1]  Danielle Marsh,et al.  Overview of the spacecraft design for the Psyche mission concept , 2018, 2018 IEEE Aerospace Conference.

[2]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[3]  Hannah R. Kerner,et al.  The Lunar Polar Hydrogen Mapper (LunaH-Map) CubeSat Mission , 2016 .

[4]  Roberto Furfaro,et al.  Adaptive Guidance and Integrated Navigation with Reinforcement Meta-Learning , 2019, Acta Astronautica.

[5]  D. Folta,et al.  Low-Thrust Trajectory Design for a Cislunar CubeSat Leveraging Structures from the Bicircular Restricted Four-Body Problem , 2019 .

[6]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[7]  David Manzella,et al.  Summary of Gateway Power and Propulsion Element (PPE) Studies , 2019, 2019 IEEE Aerospace Conference.

[8]  Sergey Levine,et al.  Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.

[9]  Sergey Levine,et al.  Trust Region Policy Optimization , 2015, ICML.

[10]  Amanda F. Haapala,et al.  A Framework for Constructing Transfers Linking Periodic Libration Point Orbits in the Spatial Circular Restricted Three-Body Problem , 2016, Int. J. Bifurc. Chaos.

[11]  Shoya Higa,et al.  Vision-Based Estimation of Driving Energy for Planetary Rovers Using Deep Learning and Terramechanics , 2019, IEEE Robotics and Automation Letters.

[12]  Raúl Rojas,et al.  The Backpropagation Algorithm , 1996 .

[13]  Masahiro Ono,et al.  SPOC: Deep Learning-based Terrain Classification for Mars Rover Missions , 2016 .

[14]  Richard Linares,et al.  Seeker based Adaptive Guidance via Reinforcement Meta-Learning Applied to Asteroid Close Proximity Operations , 2019, ArXiv.

[15]  Demis Hassabis,et al.  Mastering the game of Go without human knowledge , 2017, Nature.

[16]  Natasha Bosanac,et al.  Using Reinforcement Learning to Design a Low-Thrust Approach into a Periodic Orbit in a Multi-Body System , 2020 .

[17]  Richard Linares,et al.  Interplanetary Low-Thrust Design Using Proximal Policy Optimization , 2019 .

[18]  Yuval Tassa,et al.  Emergence of Locomotion Behaviours in Rich Environments , 2017, ArXiv.

[19]  M. W. Weeks,et al.  Onboard Autonomous Targeting for the Trans-Earth Phase of Orion , 2010 .

[20]  Daniel J. Scheeres,et al.  Identifying heteroclinic connections using artificial neural networks , 2019, Acta Astronautica.

[21]  T. Estlin,et al.  AEGIS autonomous targeting for ChemCam on Mars Science Laboratory: Deployment and results of initial science team use , 2017, Science Robotics.

[22]  M. W. Weeks,et al.  Finite-Burn Linear Targeting Algorithm for Autonomous Path Planning and Guidance , 2012 .

[23]  Shane B. Robinson,et al.  Encke-Beta Predictor for Orion Burn Targeting and Guidance , 2016 .

[24]  Matthew A. Vavrina,et al.  Global, Multi-Objective Trajectory Optimization With Parametric Spreading , 2017 .

[25]  Tara A. Estlin,et al.  AEGIS Automated Science Targeting for the MER Opportunity Rover , 2012, TIST.

[26]  Andrew D Cox A Dynamical Systems Perspective for Preliminary Low-Thrust Trajectory Design in Multi-Body Regimes , 2020 .

[27]  Christopher T. Russell,et al.  The Dawn Mission to Minor Planets 4 Vesta and 1 Ceres , 2012 .

[28]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[29]  Roberto Furfaro,et al.  Deep reinforcement learning for six degree-of-freedom planetary landing , 2020 .

[30]  Roberto Furfaro,et al.  Six Degree-of-Freedom Hovering over an Asteroid with Unknown Environmental Dynamics via Reinforcement Learning , 2020 .

[31]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[32]  A dynamical approach to precision entry in multi-body regimes: Dispersion manifolds , 2013 .

[33]  Kathleen C. Howell,et al.  Dynamical structures in a low-thrust, multi-body model with applications to trajectory design , 2019, Celestial Mechanics and Dynamical Astronomy.

[34]  David B. Spencer,et al.  Augmenting Spacecraft Maneuver Strategy Optimization for Detection Avoidance With Competitive Coevolution , 2020 .

[35]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[36]  Herke van Hoof,et al.  Addressing Function Approximation Error in Actor-Critic Methods , 2018, ICML.

[37]  Brian Peacock,et al.  Chi‐Squared Distribution , 2010 .

[38]  Sergey Levine,et al.  High-Dimensional Continuous Control Using Generalized Advantage Estimation , 2015, ICLR.

[39]  Frank E. Laipert,et al.  Neural network optimal control in astrodynamics: Application to the missed thrust problem , 2020 .

[40]  Mauro Massari,et al.  Adaptive Generalized ZEM-ZEV Feedback Guidance for Planetary Landing via a Deep Reinforcement Learning Approach , 2020, ArXiv.

[41]  Satoshi Hosoda,et al.  Development and Testing of the Hayabusa2 Ion Engine System , 2016 .

[42]  Jason D. Frieman,et al.  The Effects of Background Pressure on SPT-140 Hall Thruster Performance , 2018, 2018 Joint Propulsion Conference.

[43]  David Folta,et al.  Rapid trajectory design in complex environments enabled by reinforcement learning and graph search strategies , 2020, Acta Astronautica.

[44]  Bernd Dachwald,et al.  Evolutionary Neurocontrol: A Smart Method for Global Optimization of Low-Thrust Trajectories , 2004 .

[45]  Cesar Ocampo,et al.  Finite Burn Maneuver Modeling for a Generalized Spacecraft Trajectory Design and Optimization System , 2004, Annals of the New York Academy of Sciences.

[46]  Gary Doran,et al.  Enabling Onboard Detection of Events of Scientific Interest for the Europa Clipper Spacecraft , 2019, KDD.

[47]  David H. Lehman,et al.  Results from the Deep Space 1 technology validation mission , 2000 .

[48]  Diane C. Davis,et al.  Orbit Maintenance and Navigation of Human Spacecraft at Cislunar Near Rectilinear Halo Orbits , 2017 .

[49]  G. Bozis Zero velocity surfaces for the general planar three-body problem , 1976 .

[50]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[51]  J. D. Yencharis,et al.  Apollo experience report: Development of guidance targeting techniques for the command module and launch vehicle , 1972 .

[52]  Roberto Furfaro,et al.  Image-based Deep Reinforcement Learning for Autonomous Lunar Landing , 2020 .

[53]  Jeremy Kepner,et al.  Interactive Supercomputing on 40,000 Cores for Machine Learning and Data Analysis , 2018, 2018 IEEE High Performance extreme Computing Conference (HPEC).

[54]  Cambridge Ma,et al.  Orion GN&C Architecture for Increased Spacecraft Automation and Autonomy Capabilities , 2008 .

[55]  K. Howell,et al.  Trajectory design for a cislunar CubeSat leveraging dynamical systems techniques: The Lunar IceCube mission , 2017 .

[56]  Richard Linares,et al.  Reinforcement learning for angle-only intercept guidance of maneuvering targets , 2019, Aerospace Science and Technology.