Evaluating Adaptive Deception Strategies for Cyber Defense with Human Adversaries

We investigate the effectiveness of various algorithms for defensive cyber‐deception in an adversarial decision‐making task using human experiments. Our combinatorial Multi‐Armed Bandit task represents an abstract version of a realistic problem in cybersecurity: allocating limited resources for defense in a way that an adversary can be most successfully deceived to attack “fake” nodes (i.e., honeypots) instead of the real ones. We propose six algorithms with different degrees of determinism, adaptivity, and customization to the human adversary's actions. We test these algorithms in six separate behavioral studies, where humans are paired against each of the six types of defense. We measure the effectiveness of the algorithms according to how humans learn the defense strategies, which is a reflection of the success of the algorithms in deceiving human adversaries. We find that the adaptivity of the strategy is more important than the expected optimality of the algorithm. Humans learned and took advantage of defense algorithms that are deterministic, nonadaptive, and not customized. At the same time, not all algorithms that were nondeterministic, adaptive, and customized, were effective. The Learning with Linear Rewards (LLR) algorithm, one that was purely adaptive, was the most successful; suggesting that adaptivity is an important feature of defense algorithms. New ways to customize the defense strategies to the adversary's behavior are needed.