Rocket Powered Landing Guidance Using Proximal Policy Optimization

Rocket recovery requires advanced guidance algorithms to achieve pinpoint landing while satisfying multiple stringent constraints. In this paper, we design a guidance law based on reinforcement learning for the powered landing phase of vertical take-off and vertical landing reusable rocket. To this end, we apply the proximal policy optimization algorithm to develop a control policy that drives the rocket to land at a specified location. The policy parameterized using a neural network is updated by performing gradient ascent algorithm. After abundant amount of training, the learned policy is evaluated in a simulation of the rocket powered landing scenario considering aerodynamic drag, and the result demonstrates the ability of the proposed guidance method to successfully land the rocket from a random initial state.

[1]  Roberto Furfaro,et al.  Deep Reinforcement Learning for Six Degree-of-Freedom Planetary Powered Descent and Landing , 2018, ArXiv.

[2]  Ping Lu,et al.  Introducing Computational Guidance and Control , 2017 .

[3]  Christian R. Shelton,et al.  Importance sampling for reinforcement learning with multiple objectives , 2001 .

[4]  Lin Ma,et al.  Direct trajectory optimization framework for vertical takeoff and vertical landing reusable rockets: case study of two-stage rockets , 2019 .

[5]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[6]  Robert Babuska,et al.  A Survey of Actor-Critic Reinforcement Learning: Standard and Natural Policy Gradients , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[7]  Roberto Furfaro,et al.  Adaptive pinpoint and fuel efficient mars landing using reinforcement learning , 2012, IEEE/CAA Journal of Automatica Sinica.

[8]  Michael Szmuk,et al.  Successive Convexification for 6-DoF Mars Rocket Powered Landing with Free-Final-Time , 2018, 1802.03827.

[9]  Sergey Levine,et al.  Trust Region Policy Optimization , 2015, ICML.

[10]  Francesco Topputo,et al.  Deep Learning for Autonomous Lunar Landing , 2018 .

[11]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[12]  Athanasios S. Polydoros,et al.  Survey of Model-Based Reinforcement Learning: Applications on Robotics , 2017, J. Intell. Robotic Syst..

[13]  Lin Ma,et al.  Trajectory optimization for lunar soft landing with complex constraints , 2017 .

[14]  Sarah Filippi,et al.  Optimism in reinforcement learning and Kullback-Leibler divergence , 2010, 2010 48th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[15]  Yuanqing Xia,et al.  Mars atmospheric entry guidance for reference trajectory tracking , 2015 .