论文信息 - Solving Online Threat Screening Games using Constrained Action Space Reinforcement Learning - 字舞流文

Solving Online Threat Screening Games using Constrained Action Space Reinforcement Learning

Large-scale screening for potential threats with limited resources and capacity for screening is a problem of interest at airports, seaports, and other ports of entry. Adversaries can observe screening procedures and arrive at a time when there will be gaps in screening due to limited resource capacities. To capture this game between ports and adversaries, this problem has been previously represented as a Stackelberg game, referred to as a Threat Screening Game (TSG). Given the significant complexity associated with solving TSGs and uncertainty in arrivals of customers, existing work has assumed that screenees arrive and are allocated security resources at the beginning of the time window. In practice, screenees such as airport passengers arrive in bursts correlated with flight time and are not bound by fixed time windows. To address this, we propose an online threat screening model in which screening strategy is determined adaptively as a passenger arrives while satisfying a hard bound on acceptable risk of not screening a threat. To solve the online problem with a hard bound on risk, we formulate it as a Reinforcement Learning (RL) problem with constraints on the action space (hard bound on risk). We provide a novel way to efficiently enforce linear inequality constraints on the action output in Deep Reinforcement Learning. We show that our solution allows us to significantly reduce screenee wait time while guaranteeing a bound on risk.

Milind Tambe | Pradeep Varakantham | Arunesh Sinha | Sanket Shah | Andrew Perrault

[1] Mina Guirguis,et al. Don't Bury your Head in Warnings: A Game-Theoretic Approach for Intelligent Allocation of Cyber-security Alerts , 2017, IJCAI.

[2] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.

[3] Michael P. Wellman,et al. Iterated Deep Reinforcement Learning in Games: History-Aware Training for Improved Stability , 2019, EC.

[4] Milind Tambe,et al. Deep Fictitious Play for Games with Continuous Action Spaces , 2019, AAMAS.

[5] Tuomas Sandholm,et al. Robust Stackelberg Equilibria in Extensive-Form Games and Extension to Limited Lookahead , 2017, AAAI.

[6] Pieter Abbeel,et al. Constrained Policy Optimization , 2017, ICML.

[7] Vincent Conitzer,et al. Computing optimal strategies to commit to in extensive-form games , 2010, EC '10.

[8] Vincent Conitzer,et al. Learning and Approximating the Optimal Strategy to Commit To , 2009, SAGT.

[9] Sarit Kraus,et al. Game-Theoretic Patrolling with Dynamic Execution Uncertainty and a Case Study on a Real Transit System , 2014, J. Artif. Intell. Res..

[10] Yan Liu,et al. Policy Learning for Continuous Space Security Games Using Neural Networks , 2018, AAAI.

[11] Nicola Basilico,et al. Leader-follower strategies for robotic patrolling in environments with arbitrary topologies , 2009, AAMAS.

[12] Milind Tambe,et al. One Size Does Not Fit All: A Game-Theoretic Approach for Dynamically and Effectively Screening for Threats , 2016, AAAI.

[13] Giovanni De Magistris,et al. OptLayer - Practical Constrained Optimization for Deep Reinforcement Learning in the Real World , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[14] Marco Pavone,et al. Risk-Constrained Reinforcement Learning with Percentile Risk Criteria , 2015, J. Mach. Learn. Res..

[15] Lantao Yu,et al. Deep Reinforcement Learning for Green Security Games with Real-Time Information , 2018, AAAI.

[16] Branislav Bosanský,et al. Combining Compact Representation and Incremental Generation in Large Games with Sequential Strategies , 2015, AAAI.

[17] Kai Wang,et al. The Price of Usability: Designing Operationalizable Strategies for Security Games , 2018, IJCAI.

[18] J. Zico Kolter,et al. OptNet: Differentiable Optimization as a Layer in Neural Networks , 2017, ICML.

[19] Pradeep Varakantham,et al. Resource Constrained Deep Reinforcement Learning , 2018, ICAPS.

[20] Warren B. Powell,et al. An Approximate Dynamic Programming Algorithm for Large-Scale Fleet Management: A Case Application , 2009, Transp. Sci..

[21] Jakub Cerný,et al. Incremental Strategy Generation for Stackelberg Equilibria in Extensive-Form Games , 2018, EC.

[22] Maria-Florina Balcan,et al. Commitment Without Regrets: Online Learning in Stackelberg Security Games , 2015, EC.

[23] Vincent Conitzer,et al. Computing Optimal Strategies to Commit to in Stochastic Games , 2012, AAAI.

[24] Matthew S. Maxwell,et al. Approximate Dynamic Programming for Ambulance Redeployment , 2010, INFORMS J. Comput..

[25] Milind Tambe,et al. Staying Ahead of the Game: Adaptive Robust Optimization for Dynamic Allocation of Threat Screening Resources , 2017, IJCAI.