论文信息 - Playing Repeated Network Interdiction Games with Semi-Bandit Feedback - 字舞流文

Playing Repeated Network Interdiction Games with Semi-Bandit Feedback

We study repeated network interdiction games with no prior knowledge of the adversary and the environment, which can model many real world network security domains. Existing works often require plenty of available information for the defender and neglect the frequent interactions between both players, which are unrealistic and impractical, and thus, are not suitable for our settings. As such, we provide the first defender strategy, that enjoys nice theoretical and practical performance guarantees, by applying the adversarial online learning approach. In particular, we model the repeated network interdiction game with no prior knowledge as an online linear optimization problem, for which a novel and efficient online learning algorithm, SBGA, is proposed, which exploits the unique semi-bandit feedback in network security domains. We prove that SBGA achieves sublinear regret against adaptive adversary, compared with both the best fixed strategy in hindsight and a near optimal adaptive strategy. Extensive experiments also show that SBGA significantly outperforms existing approaches with fast convergence rate.

Bo An | Long Tran-Thanh | Qingyu Guo | Long Tran-Thanh | Qingyu Guo | Bo An

[1] Adam Tauman Kalai,et al. Playing Games with Approximation Algorithms , 2009, SIAM J. Comput..

[2] Santosh S. Vempala,et al. Efficient algorithms for online decision problems , 2005, J. Comput. Syst. Sci..

[3] Bo An,et al. Efficient Resource Allocation for Protecting Coral Reef Ecosystems , 2016, IJCAI.

[4] Baruch Awerbuch,et al. Adaptive routing with end-to-end feedback: distributed learning and geometric approaches , 2004, STOC '04.

[5] R. Kevin Wood,et al. Deterministic network interdiction , 1993 .

[6] Amos Azaria,et al. Analyzing the Effectiveness of Adversary Modeling in Security Games , 2013, AAAI.

[7] Bo An,et al. Security games with surveillance cost and optimal timing of attack execution , 2013, AAMAS.

[8] Milind Tambe,et al. "A Game of Thrones": When Human Behavior Models Compete in Repeated Stackelberg Security Games , 2015, AAMAS.

[9] Rong Yang,et al. Adaptive resource allocation for wildlife protection against illegal poachers , 2014, AAMAS.

[10] J. Limb,et al. Editorial on the IEEE/OSA Journal of Lightwave Technology and the IEEE Journal on Selected Areas in Communications , 1986 .

[11] Chunyan Miao,et al. Optimal Interdiction of Illegal Network Flow , 2016, IJCAI.

[12] Milind Tambe,et al. Robust Protection of Fisheries with COmPASS , 2014, AAAI.

[13] Bo An,et al. Coalitional Security Games , 2016, AAMAS.

[14] Ariel D. Procaccia,et al. Learning Optimal Commitment to Overcome Insecurity , 2014, NIPS.

[15] June S. Beittel. Mexico: Organized Crime and Drug Trafficking Organizations , 2019 .

[16] Thomas P. Hayes,et al. Robbing the bandit: less regret in online geometric optimization against an adaptive adversary , 2006, SODA '06.

[17] D. McFadden. Quantal Choice Analysis: A Survey , 1976 .

[18] Nicholas R. Jennings,et al. Playing Repeated Security Games with No Prior Knowledge , 2016, AAMAS.

[19] Bo An,et al. Stop Nuclear Smuggling Through Efficient Container Inspection , 2017, AAMAS.

[20] Gábor Lugosi,et al. Prediction, learning, and games , 2006 .

[21] Avrim Blum,et al. Online Geometric Optimization in the Bandit Setting Against an Adaptive Adversary , 2004, COLT.

[22] Thomas P. Hayes,et al. The Price of Bandit Information for Online Optimization , 2007, NIPS.

[23] Peter Auer,et al. The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..

[24] Elad Hazan,et al. Competing in the Dark: An Efficient Algorithm for Bandit Linear Optimization , 2008, COLT.

[25] BERNARD M. WAXMAN,et al. Routing of multipoint connections , 1988, IEEE J. Sel. Areas Commun..

[26] Colin Camerer. Behavioral Game Theory: Experiments in Strategic Interaction , 2003 .

[27] Zhen Wang,et al. Computing Optimal Monitoring Strategy for Detecting Terrorist Plots , 2016, AAAI.

[28] Vincent Conitzer,et al. A double oracle algorithm for zero-sum security games on graphs , 2011, AAMAS.

[29] Gergely Neu,et al. Importance Weighting Without Importance Weights: An Efficient Algorithm for Combinatorial Semi-Bandits , 2015, J. Mach. Learn. Res..

[30] N. Assimakopoulos,et al. A network interdiction model for hospital infection control. , 1987, Computers in biology and medicine.