论文信息 - Rocket Powered Landing Guidance Using Proximal Policy Optimization

Rocket Powered Landing Guidance Using Proximal Policy Optimization

Rocket recovery requires advanced guidance algorithms to achieve pinpoint landing while satisfying multiple stringent constraints. In this paper, we design a guidance law based on reinforcement learning for the powered landing phase of vertical take-off and vertical landing reusable rocket. To this end, we apply the proximal policy optimization algorithm to develop a control policy that drives the rocket to land at a specified location. The policy parameterized using a neural network is updated by performing gradient ascent algorithm. After abundant amount of training, the learned policy is evaluated in a simulation of the rocket powered landing scenario considering aerodynamic drag, and the result demonstrates the ability of the proposed guidance method to successfully land the rocket from a random initial state.

Yifan Chen | Lin Ma | Lin Ma | Yifan Chen

[1] Roberto Furfaro,et al. Deep Reinforcement Learning for Six Degree-of-Freedom Planetary Powered Descent and Landing , 2018, ArXiv.

[2] Ping Lu,et al. Introducing Computational Guidance and Control , 2017 .

[3] Christian R. Shelton,et al. Importance sampling for reinforcement learning with multiple objectives , 2001 .

[4] Lin Ma,et al. Direct trajectory optimization framework for vertical takeoff and vertical landing reusable rockets: case study of two-stage rockets , 2019 .

[5] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.

[6] Robert Babuska,et al. A Survey of Actor-Critic Reinforcement Learning: Standard and Natural Policy Gradients , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[7] Roberto Furfaro,et al. Adaptive pinpoint and fuel efficient mars landing using reinforcement learning , 2012, IEEE/CAA Journal of Automatica Sinica.

[8] Michael Szmuk,et al. Successive Convexification for 6-DoF Mars Rocket Powered Landing with Free-Final-Time , 2018, 1802.03827.

[9] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.

[10] Francesco Topputo,et al. Deep Learning for Autonomous Lunar Landing , 2018 .

[11] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[12] Athanasios S. Polydoros,et al. Survey of Model-Based Reinforcement Learning: Applications on Robotics , 2017, J. Intell. Robotic Syst..

[13] Lin Ma,et al. Trajectory optimization for lunar soft landing with complex constraints , 2017 .

[14] Sarah Filippi,et al. Optimism in reinforcement learning and Kullback-Leibler divergence , 2010, 2010 48th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[15] Yuanqing Xia,et al. Mars atmospheric entry guidance for reference trajectory tracking , 2015 .