论文信息 - Adversarial vulnerabilities of human decision-making - 字舞流文

Adversarial vulnerabilities of human decision-making

Significance “What I cannot efficiently break, I cannot understand.” Understanding the vulnerabilities of human choice processes allows us to detect and potentially avoid adversarial attacks. We develop a general framework for creating adversaries for human decision-making. The framework is based on recent developments in deep reinforcement learning models and recurrent neural networks and can in principle be applied to any decision-making task and adversarial objective. We show the performance of the framework in three tasks involving choice, response inhibition, and social decision-making. In all of the cases the framework was successful in its adversarial attack. Furthermore, we show various ways to interpret the models to provide insights into the exploitability of human choice. Adversarial examples are carefully crafted input patterns that are surprisingly poorly classified by artificial and/or natural neural networks. Here we examine adversarial vulnerabilities in the processes responsible for learning and choice in humans. Building upon recent recurrent neural network models of choice processes, we propose a general framework for generating adversarial opponents that can shape the choices of individuals in particular decision-making tasks toward the behavioral patterns desired by the adversary. We show the efficacy of the framework through three experiments involving action selection, response inhibition, and social decision-making. We further investigate the strategy used by the adversary in order to gain insights into the vulnerabilities of human choice. The framework may find applications across behavioral sciences in helping detect and avoid flawed choice.

Peter Dayan | Amir Dezfouli | Richard Nock | P. Dayan | R. Nock | A. Dezfouli

[1] A. Tversky,et al. Judgment under Uncertainty: Heuristics and Biases , 1974, Science.

[2] A. Tversky,et al. Judgment under Uncertainty: Heuristics and Biases , 1974, Science.

[3] V. Smith,et al. Positive reciprocity and intentions in trust games , 2003 .

[4] S. Quartz,et al. Getting to Know You: Reputation and Trust in a Two-Person Economic Exchange , 2005, Science.

[5] Martin A. Riedmiller,et al. Batch Reinforcement Learning , 2012, Reinforcement Learning.

[6] Yoshua Bengio,et al. Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[7] Joan Bruna,et al. Intriguing properties of neural networks , 2013, ICLR.

[8] Joshua de Leeuw,et al. jsPsych: A JavaScript library for creating behavioral experiments in a Web browser , 2014, Behavior Research Methods.

[9] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[10] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.

[11] Peter Dayan,et al. Monte Carlo Planning Method Estimates Planning Horizons during Interactive Social Exchange , 2015, PLoS Comput. Biol..

[12] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[13] Peter L. Bartlett,et al. RL$^2$: Fast Reinforcement Learning via Slow Reinforcement Learning , 2016, ArXiv.

[14] Martín Abadi,et al. TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.

[15] Ji Hyun Bak,et al. Adaptive optimal training of animal behavior , 2017, NIPS.

[16] Ming-Yu Liu,et al. Tactics of Adversarial Attack on Deep Reinforcement Learning Agents , 2017, IJCAI.

[17] Zeb Kurth-Nelson,et al. Learning to reinforcement learn , 2016, CogSci.

[18] Peter Dayan,et al. Integrated accounts of behavioral and neuroimaging data using flexible recurrent neural network models , 2018, bioRxiv.

[19] Herke van Hoof,et al. Addressing Function Approximation Error in Actor-Critic Methods , 2018, ICML.

[20] Lihong Li,et al. Adversarial Attacks on Stochastic Bandits , 2018, NeurIPS.

[21] Patrick G. Bissett,et al. Uncovering the structure of self-regulation through data-driven ontology discovery , 2019, Nature Communications.

[22] Peter Dayan,et al. Models that learn how humans learn: The case of decision-making and its disorders , 2019, PLoS Comput. Biol..

[23] Nicolas Le Roux,et al. Understanding the impact of entropy on policy optimization , 2018, ICML.

[24] Richard Nock,et al. Monge beats Bayes: Hardness Results for Adversarial Training , 2018, ICML.

[25] Yonatan Loewenstein,et al. From choice architecture to choice engineering , 2019, Nature Communications.

[26] Richard Nock,et al. Disentangled behavioral representations , 2019, bioRxiv.