Policy Gradient for Observer Trajectory Planning with Application in Multi-target Tracking Problems

Tracking multiple moving targets with bearing-only measurement is a challenging task, due to the inherent difficulties in determining the correct trajectory of the observer that will meet observability conditions. The work presented here formulates Observer Trajectory Planning (OTP) as a continuous control problem, and proposes reinforcement learning as a solution. The proposed architecture in this work constitutes a model-independent framework that allows for the estimation of the states of targets, and that allows multiple targets to be tracked in a realistic scenario, where the agent has no prior information about the initial locations and velocities of the targets.

[1]  William P. Baker,et al.  Stochastic Real-Time Optimal Control for Bearing-Only Trajectory Planning , 2014 .

[2]  Marcel L. Hernandez,et al.  Optimal Sensor Trajectories in Bearings-Only Tracking , 2004 .

[3]  Robin J. Evans,et al.  Simulation-Based Optimal Sensor Scheduling with Application to Observer Trajectory Planning , 2005, Proceedings of the 44th IEEE Conference on Decision and Control.

[4]  Ba-Tuong Vo,et al.  Sensor management for multi-target tracking via multi-Bernoulli filtering , 2013, Autom..

[5]  M.H. Ferdowsi,et al.  Observability conditions for target states with bearing-only measurements in three-dimensional case , 2006, 2006 IEEE Conference on Computer Aided Control System Design, 2006 IEEE International Conference on Control Applications, 2006 IEEE International Symposium on Intelligent Control.

[6]  Samuel S. Blackman,et al.  Multiple-Target Tracking with Radar Applications , 1986 .

[7]  Thia Kirubarajan,et al.  Estimation with Applications to Tracking and Navigation: Theory, Algorithms and Software , 2001 .

[8]  Per Skoglar,et al.  UAV Path and Sensor Planning Methods for Multiple Ground Target Search and Tracking - A Literature Survey , 2007 .

[9]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[10]  Y. Bar-Shalom,et al.  The probabilistic data association filter , 2009, IEEE Control Systems.

[11]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[12]  Weiliang Zhu,et al.  Research on the Observability of Bearings-Only Tracking for Moving Target in Constant Acceleration Based on Multiple Sonar Sensors , 2012, 2012 Fourth International Symposium on Information Science and Engineering.

[13]  Carlos H. Muravchik,et al.  Posterior Cramer-Rao bounds for discrete-time nonlinear filtering , 1998, IEEE Trans. Signal Process..