论文信息 - Adaptive Guidance and Integrated Navigation with Reinforcement Meta-Learning - 字舞流文

Adaptive Guidance and Integrated Navigation with Reinforcement Meta-Learning

Abstract This paper proposes a novel adaptive guidance system developed using reinforcement meta-learning with a recurrent policy and value function approximator. The use of recurrent network layers allows the deployed policy to adapt in real time to environmental forces acting on the agent. We compare the performance of the DR/DV guidance law, an RL agent with a non-recurrent policy, and an RL agent with a recurrent policy in four challenging environments with unknown but highly variable dynamics. These tasks include a safe Mars landing with random engine failure and a landing on an asteroid with unknown environmental dynamics. We also demonstrate the ability of a RL meta-learning optimized policy to implement a guidance law using observations consisting of only Doppler radar altimeter readings in a Mars landing environment, and LIDAR altimeter readings in an asteroid landing environment thus integrating guidance and navigation.

Roberto Furfaro | Richard Linares | Brian Gaudet | B. Gaudet | R. Linares | R. Furfaro

[1] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.

[2] Pieter Abbeel,et al. Meta Learning Shared Hierarchies , 2017, ICLR.

[3] Andrew Y. Ng,et al. Shaping and policy search in reinforcement learning , 2003 .

[4] Sergey Levine,et al. Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[5] Pieter Abbeel,et al. A Simple Neural Attentive Meta-Learner , 2017, ICLR.

[6] Christopher N. D'Souza,et al. AN OPTIMAL GUIDANCE LAW FOR PLANETARY LANDING , 1997 .

[7] Roberto Furfaro,et al. Deep Reinforcement Learning for Six Degree-of-Freedom Planetary Powered Descent and Landing , 2018, ArXiv.

[8] B. Gaudet,et al. A navigation scheme for pinpoint mars landing using radar altimetry, a digital terrain model, and a particle filter , 2014 .

[9] Zeb Kurth-Nelson,et al. Learning to reinforcement learn , 2016, CogSci.

[10] Roberto Furfaro,et al. Real-time state estimation for asteroid close-proximity operations via lidar altimetry and a particle filter , 2014 .

[11] Jun Sun,et al. Mars entry fault-tolerant control via neural network and structure adaptive model inversion , 2019 .

[12] Bin Liang,et al. Attitude Dynamics of Spacecraft with Time-Varying Inertia During On-Orbit Refueling , 2018, Journal of Guidance, Control, and Dynamics.

[13] James Biggs,et al. Adaptive Fault-Tolerant Control of Spacecraft Attitude Dynamics with Actuator Failures , 2015 .

[14] Balaraman Ravindran,et al. EPOpt: Learning Robust Neural Network Policies Using Model Ensembles , 2016, ICLR.

[15] Mark J. Balas,et al. Trajectory-Driven Adaptive Control of Autonomous Unmanned Aerial Vehicles with Disturbance Accommodation , 2018, Journal of Guidance, Control, and Dynamics.

[16] Marcin Andrychowicz,et al. Sim-to-Real Transfer of Robotic Control with Dynamics Randomization , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[17] R. Battin. An introduction to the mathematics and methods of astrodynamics , 1987 .

[18] Greg Turk,et al. Preparing for the Unknown: Learning a Universal Policy with Online System Identification , 2017, Robotics: Science and Systems.

[19] Yoshua Bengio,et al. Gated Feedback Recurrent Neural Networks , 2015, ICML.