Active Perception in Adversarial Scenarios using Maximum Entropy Deep Reinforcement Learning

We pose an active perception problem where an autonomous agent actively interacts with a second agent with potentially adversarial behaviors. Given the uncertainty in the intent of the other agent, the objective is to collect further evidence to help discriminate potential threats. The main technical challenges are the partial observability of the agent intent, the adversary modeling, and the corresponding uncertainty modeling. Note that an adversary agent may act to mislead the autonomous agent by using a deceptive strategy that is learned from past experiences. We propose an approach that combines belief space planning, generative adversary modeling, and maximum entropy reinforcement learning to obtain a stochastic belief space policy. By accounting for various adversarial behaviors in the simulation framework and minimizing the predictability of the autonomous agent’s action, the resulting policy is more robust to unmodeled adversarial strategies. This improved robustness is empirically shown against an adversary that adapts to and exploits the autonomous agent’s policy when compared with a standard Chance-Constraint Partially Observable Markov Decision Process robust approach.

[1]  Abdel-Illah Mouaddib,et al.  Adversarial Intention Recognition as Inverse Game-Theoretic Planning for Threat Assessment , 2016, 2016 IEEE 28th International Conference on Tools with Artificial Intelligence (ICTAI).

[2]  Kurt Keutzer,et al.  Regret Minimization for Partially Observable Deep Reinforcement Learning , 2017, ICML.

[3]  Frans A. Oliehoek,et al.  Learning in POMDPs with Monte Carlo Tree Search , 2017, ICML.

[4]  Jonathan P. How,et al.  Deep Decentralized Multi-task Multi-Agent RL under Partial Observability , 2017 .

[5]  Gal A. Kaminka,et al.  Keyhole Adversarial Plan Recognition for Recognition of Suspicious and Anomalous Behavior , 2014 .

[6]  David Hsu,et al.  QMDP-Net: Deep Learning for Planning under Partial Observability , 2017, NIPS.

[7]  Li Li,et al.  An approach to the misleading action solving in plan recognition , 2012, 2012 International Conference on Machine Learning and Cybernetics.

[8]  Jun Wang,et al.  Multiagent Bidirectionally-Coordinated Nets for Learning to Play StarCraft Combat Games , 2017, ArXiv.

[9]  Joelle Pineau,et al.  A Bayesian Approach for Learning and Planning in Partially Observable Markov Decision Processes , 2011, J. Mach. Learn. Res..

[10]  Jordan L. Boyd-Graber,et al.  Opponent Modeling in Deep Reinforcement Learning , 2016, ICML.

[11]  David Silver,et al.  Fictitious Self-Play in Extensive-Form Games , 2015, ICML.

[12]  David Silver,et al.  Deep Reinforcement Learning from Self-Play in Imperfect-Information Games , 2016, ArXiv.

[13]  Prashant Doshi,et al.  Interactive POMDPs: properties and preliminary results , 2004, Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, 2004. AAMAS 2004..

[14]  Bart De Schutter,et al.  Multi-agent Reinforcement Learning: An Overview , 2010 .

[15]  Siobhán Clarke,et al.  Decentralised Multi-Agent Reinforcement Learning for Dynamic and Uncertain Environments , 2014, ArXiv.

[16]  Jonathan P. How,et al.  Deep Decentralized Multi-task Multi-Agent Reinforcement Learning under Partial Observability , 2017, ICML.

[17]  Sylvie Thiébaux,et al.  RAO*: An Algorithm for Chance-Constrained POMDP's , 2016, AAAI.

[18]  Takayuki Osogami,et al.  Robust partially observable Markov decision process , 2015, ICML.

[19]  Peter Stone,et al.  Deep Recurrent Q-Learning for Partially Observable MDPs , 2015, AAAI Fall Symposia.

[20]  Sergey Levine,et al.  Reinforcement Learning with Deep Energy-Based Policies , 2017, ICML.

[21]  Abdel-Illah Mouaddib,et al.  A Generative Game-Theoretic Framework for Adversarial Plan Recognition , 2015 .

[22]  Shimon Whiteson,et al.  Stabilising Experience Replay for Deep Multi-Agent Reinforcement Learning , 2017, ICML.

[23]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[24]  Yi Wu,et al.  Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments , 2017, NIPS.

[25]  Albert Xin Jiang,et al.  Game-Theoretic Goal Recognition Models with Applications to Security Domains , 2017, GameSec.

[26]  Peng Peng,et al.  Multiagent Bidirectionally-Coordinated Nets: Emergence of Human-level Coordination in Learning to Play StarCraft Combat Games , 2017, 1703.10069.

[27]  Michael H. Bowling,et al.  Regret Minimization in Games with Incomplete Information , 2007, NIPS.