Missile Defence and Interceptor Allocation by LVQ-RBFMulti-agent Hybrid Architecture

This paper proposes a solution methodology for a missile defence problem using theatremissile defence (TMD) concept. In the missile defence scenario, the concept of TMD is generallyused for the optimal allocation of interceptors to counter the attack missiles. The problem iscomputationally complex due to the presence of enormous state space. The Learning vectorquantiser–Radial basis function (LVQ-RBF) multi-agent hybrid neural architecture is used as thelearning structure, and Q-learning as the learning method. The LVQ-RBF multi-agent hybridneural architecture overcomes the complex state space issue using the partitioning and weightedlearning approach. The proposed LVQ-RBF multi- agent hybrid architecture improvises thelearning performance by the local and global error criterion. The state space is explored withinitial coarse partitioning by LVQ neural network. The fine partitioning of the state space isperformed using the multi-agent RBF neural network. The discrete reward scheme is used forLVQ-RBF multi-agent hybrid neural architecture. It has a hierarchical architecture which enablesquicker convergence without the loss of accuracy. The simulation of the TMD is performed with500 assets and six priority of assets.

[1]  Kazushi Ikeda,et al.  A new criterion using information gain for action selection strategy in reinforcement learning , 2004, IEEE Transactions on Neural Networks.

[2]  P. Y. Glorennec,et al.  Fuzzy Q-learning and dynamical fuzzy Q-learning , 1994, Proceedings of 1994 IEEE 3rd International Fuzzy Systems Conference.

[3]  Jeremy Wyatt,et al.  Exploration and inference in learning from reinforcement , 1998 .

[4]  Lionel Jouffe,et al.  Fuzzy inference system learning by reinforcement methods , 1998, IEEE Trans. Syst. Man Cybern. Part C.

[5]  Ron Sun,et al.  Multi-agent reinforcement learning: weighting and partitioning , 1999, Neural Networks.

[6]  Dimitri P. Bertsekas,et al.  Missile defense and interceptor allocation by neuro-dynamic programming , 2000, IEEE Trans. Syst. Man Cybern. Part A.

[7]  Richard Bellman,et al.  Adaptive Control Processes: A Guided Tour , 1961, The Mathematical Gazette.

[8]  PAUL J. WERBOS,et al.  Generalization of backpropagation with application to a recurrent gas market model , 1988, Neural Networks.

[9]  V. Gullapalli,et al.  Acquiring robot skills via reinforcement learning , 1994, IEEE Control Systems.

[10]  H. Bergman,et al.  Information processing, dimensionality reduction and reinforcement learning in the basal ganglia , 2003, Progress in Neurobiology.

[11]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[12]  Relu Patrascu,et al.  Adaptive exploration in reinforcement learning , 1999, IJCNN'99. International Joint Conference on Neural Networks. Proceedings (Cat. No.99CH36339).

[13]  Henry Y. K. Lau,et al.  Adaptive state space partitioning for reinforcement learning , 2004, Eng. Appl. Artif. Intell..

[14]  Gideon Dror,et al.  Dynamic proximity of spatio-temporal sequences , 2004, IEEE Transactions on Neural Networks.

[15]  Christopher M. Bishop,et al.  Neural networks for pattern recognition , 1995 .

[16]  Kai-Tai Song,et al.  Robot Control Optimization Using Reinforcement Learning , 1998, J. Intell. Robotic Syst..

[17]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[18]  Sebastian Thrun,et al.  Efficient Exploration In Reinforcement Learning , 1992 .