Average Reward Dynamic Programming Applied to a Persistent Visitation and Data Delivery Problem
暂无分享,去创建一个
We are interested in the persistent surveillance of an area of interest comprised of stations/ data nodes that need to be visited in a cyclic manner. The data collection task is undertaken by a UAV which autonomously executes the mission. In addition to geographically distributed stations, the scenario also includes a central depot, where data collected from the different nodes must be delivered. In this context, the performance criteria, in addition to a desired minimal cycle time, also entails minimizing the delay in delivering the data collected from each node to the depot. Each node has a priority/ weight associated with it that characterizes the relative importance between timely delivery of data from the nodes. We pose the problem as an average/ cycle reward maximization problem; where the UAV gains a reward that is a decreasing function of weighted delay in data delivery from the nodes. Since we aim to maximize the average reward, the solution also favors shorter overall cycle time. In a cycle, each station is visited exactly once; however, we allow the UAV to visit the depot more than once in a cycle. Evidently, this allows for quicker delivery of data from a higher priority node. We apply results from average reward maximization stochastic dynamic programming to our deterministic case and solve the problem using Linear Programming. We also discuss the special case of no penalty on delivery delay, whence the problem collapses to the well known metric Traveling Salesman Problem.Copyright © 2017 by ASME