Deceptive Kernel Function on Observations of Discrete POMDP

This paper studies the deception applied on agent in a partially observable Markov decision process. We introduce deceptive kernel function (the kernel) applied to agent's observations in a discrete POMDP. Based on value iteration, value function approximation and POMCP three characteristic algorithms used by agent, we analyze its belief being misled by falsified observations as the kernel's outputs and anticipate its probable threat on agent's reward and potentially other performance. We validate our expectation and explore more detrimental effects of the deception by experimenting on two POMDP problems. The result shows that the kernel applied on agent's observation can affect its belief and substantially lower its resulting rewards; meantime certain implementation of the kernel could induce other abnormal behaviors by the agent.

[1]  Joel Veness,et al.  Monte-Carlo Planning in Large POMDPs , 2010, NIPS.

[2]  Olivier Buffet,et al.  POMDPs Make Better Hackers: Accounting for Uncertainty in Penetration Testing , 2012, AAAI.

[3]  Ufuk Topcu,et al.  Reward-Based Deception with Cognitive Bias , 2019, 2019 IEEE 58th Conference on Decision and Control (CDC).

[4]  Leslie Pack Kaelbling,et al.  Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[5]  Ufuk Topcu,et al.  Deception in Optimal Control , 2018, 2018 56th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[6]  Sebastian Junges,et al.  The Partially Observable Games We Play for Cyber Deception , 2018, ArXiv.

[7]  Joelle Pineau,et al.  Online Planning Algorithms for POMDPs , 2008, J. Artif. Intell. Res..

[8]  Edward J. Sondik,et al.  The Optimal Control of Partially Observable Markov Processes over a Finite Horizon , 1973, Oper. Res..

[9]  Richard R. Brooks,et al.  Stochastic Tools for Network Intrusion Detection , 2017, ArXiv.

[10]  Demosthenis Teneketzis,et al.  A POMDP Approach to the Dynamic Defense of Large-Scale Cyber Networks , 2018, IEEE Transactions on Information Forensics and Security.

[11]  Xiaojin Zhu,et al.  Policy Teaching via Environment Poisoning: Training-time Adversarial Attacks against Reinforcement Learning , 2020, ICML.

[12]  Alexandre Proutière,et al.  Optimal Attacks on Reinforcement Learning Policies , 2019, ArXiv.

[13]  Zhisheng Hu,et al.  Online Algorithms for Adaptive Cyber Defense on Bayesian Attack Graphs , 2017, MTD@CCS.

[14]  Yuzhe Ma,et al.  Adaptive Reward-Poisoning Attacks against Reinforcement Learning , 2020, ICML.

[15]  Milos Hauskrecht,et al.  Value-Function Approximations for Partially Observable Markov Decision Processes , 2000, J. Artif. Intell. Res..

[16]  Quanyan Zhu,et al.  Deceptive Reinforcement Learning Under Adversarial Manipulations on Cost Signals , 2019, GameSec.

[17]  Daniel Grosu,et al.  A Game Theoretic Investigation of Deception in Network Security , 2009, 2009 Proceedings of 18th International Conference on Computer Communications and Networks.

[18]  Branislav Bosanský,et al.  Manipulating Adversary's Belief: A Dynamic Game Approach to Deception by Design for Proactive Network Security , 2017, GameSec.

[19]  Quanyan Zhu,et al.  Modeling and Analysis of Leaky Deception Using Signaling Games With Evidence , 2018, IEEE Transactions on Information Forensics and Security.

[20]  Quanyan Zhu,et al.  A Game-theoretic Taxonomy and Survey of Defensive Deception for Cybersecurity and Privacy , 2017, ACM Comput. Surv..

[21]  Xiaoqiang Ren,et al.  Defensive deception against reactive jamming attacks in remote state estimation , 2020, Autom..

[22]  Reid G. Simmons,et al.  Heuristic Search Value Iteration for POMDPs , 2004, UAI.