Bridging Heuristic and Deep Learning Approaches to Sensor Tasking

Space is becoming a more crowded and contested domain, but the techniques used to task the sensors monitoring this environment have not significantly changed since the implementation of James Miller’s marginal analysis technique used in the Special Perturbations (SP) Tasker in 2007. Centralized tasker / scheduler approaches have used a Markov Decision Process (MDP) formulation, but myopic solutions fail to account for future states and non-myopic solutions tend to be computationally infeasible at scale. Linares and Furfaro proposed solving an MDP formulation of the Sensor Allocation Problem (SAP) using Deep Reinforcement Learning (DRL). DRL has been instrumental in solving many high-dimensional control problems previously considered too complex to solve at an expert level, including Go, Atari 2600, Dota 2, Starcraft 2 and autonomous driving. Linares and Furfaro showed DRL could converge on effective policies for sets of up to 300 objects in the same orbital plane. Jones expanded on that work to a full three-dimensional case with objects in diverse orbits. DRL methods can require significant training time to learn from an a priori state. This paper builds on past work by applying imitation learning to bootstrap DRL methods with existing heuristic solutions. We show that a Demonstration Guided DRL (DG-DRL) approach can effectively replicate a near-optimal tasker’s performance using trajectories from a sub-optimal heuristic. Further, we show that our approach avoids the poor initial performance typical of online DRL approaches. Code is available as an open source library at: https://github.com/AshHarvey/ssa-gym