论文信息 - Air-Combat Strategy Using Approximate Dynamic Programming

Air-Combat Strategy Using Approximate Dynamic Programming

Unmanned aircraft systems have the potential to perform many of the dangerous missions currently flown by manned aircraft, yet the complexity of some tasks, such as air combat, have precluded unmanned aircraft systems from successfully carrying out these missions autonomously. This paper presents a formulation of a level-flight fixed-velocity one-on-one air-combat maneuvering problem and an approximate dynamic programming approach for computing an efficient approximation of the optimal policy. In the version of the problem formulation considered, the aircraft learning the optimal policy is given a slight performance advantage. This approximate dynamic programming approach provides a fast response to a rapidly changing tactical situation, long planning horizons, and good performance, without explicit coding of air-combat tactics. The method's success is due to extensive feature development, reward shaping, and trajectory sampling. An accompanying fast and effective rollout-based policy extraction method is used to accomplish online implementation. Simulation results are provided that demonstrate the robustness of the method against an opponent, beginning from both offensive and defensive situations. Flight results are also presented using unmanned aircraft systems flown at the Massachusetts Institute of Technology's real-time indoor autonomous vehicle test environment.

[1] R Bellman,et al. On the Theory of Dynamic Programming. , 1952, Proceedings of the National Academy of Sciences of the United States of America.

[2] Robert L. Shaw,et al. Fighter Combat: Tactics and Maneuvering , 1985 .

[3] G. H. Burgin,et al. Rule-Based Air Combat Simulation , 1988 .

[4] Michael Lewis,et al. Game theory for automated maneuvering during air-to-air combat , 1990 .

[5] S. M. Amin,et al. Maneuver prediction in air combat via artificial neural networks , 1992 .

[6] Peter Norvig,et al. Artificial Intelligence: A Modern Approach , 1995 .

[7] Gerald Tesauro,et al. Temporal difference learning and TD-Gammon , 1995, CACM.

[8] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[9] John N. Tsitsiklis,et al. Rollout Algorithms for Combinatorial Optimization , 1997, J. Heuristics.

[10] Vladimir Vapnik,et al. Statistical learning theory , 1998 .

[11] Nicola Secomandi,et al. Comparing neuro-dynamic programming algorithms for the vehicle routing problem with stochastic demands , 2000, Comput. Oper. Res..

[12] Peter Norvig,et al. Artificial intelligence - a modern approach, 2nd Edition , 2003, Prentice Hall series in artificial intelligence.

[13] J. Sprinkle,et al. Encoding aerial pursuit/evasion games with fixed wing aircraft into a nonlinear model predictive tracking controller , 2004, 2004 43rd IEEE Conference on Decision and Control (CDC) (IEEE Cat. No.04CH37601).

[14] John N. Tsitsiklis,et al. Feature-based methods for large scale dynamic programming , 2004, Machine Learning.

[15] J. Sprinkle,et al. Implementing and testing a nonlinear model predictive tracking controller for aerial pursuit/evasion games on a fixed wing aircraft , 2005, Proceedings of the 2005, American Control Conference, 2005..

[16] Kai Virtanen,et al. Modeling Air Combat by a Moving Horizon Influence Diagram Game , 2006 .

[17] Shie Mannor,et al. Automatic basis function construction for approximate dynamic programming and reinforcement learning , 2006, ICML.

[18] Jonathan P. How,et al. Indoor Multi-Vehicle Flight Testbed for Fault Detection, Isolation, and Recovery , 2006 .

[19] Warrren B Powell. Approximating Value Functions , 2007 .

[20] Jonathan P. How,et al. Performance and Lyapunov Stability of a Nonlinear Path Following Guidance Method , 2007 .

[21] Jonathan P. How,et al. Hover, Transition, and Level Flight Control Design for a Single-Propeller Indoor Airplane , 2007 .

[22] Warren B. Powell,et al. Approximate Dynamic Programming: Solving the Curses of Dimensionality (Wiley Series in Probability and Statistics) , 2007 .

[23] Jonathan P. How,et al. The MIT Indoor Multi-Vehicle Flight Testbed , 2007, Proceedings 2007 IEEE International Conference on Robotics and Automation.

[24] Jonathan P. How,et al. Approximate dynamic programming using support vector regression , 2008, 2008 47th IEEE Conference on Decision and Control.

[25] James S. McGrew,et al. Real-time maneuvering decisions for autonomous air combat , 2008 .

[26] B. Bethke,et al. Real-time indoor autonomous vehicle test environment , 2008, IEEE Control Systems.

[27] N. Roy,et al. Computing Exploration Policies via Closed-form Least-Squares Value Iteration ∗ , 2008 .