Game Design for Eliciting Distinguishable Behavior

The ability to inferring latent psychological traits from human behavior is key to developing personalized human-interacting machine learning systems. Approaches to infer such traits range from surveys to manually-constructed experiments and games. However, these traditional games are limited because they are typically designed based on heuristics. In this paper, we formulate the task of designing behavior diagnostic games that elicit distinguishable behavior as a mutual information maximization problem, which can be solved by optimizing a variational lower bound. Our framework is instantiated by using prospect theory to model varying player traits, and Markov Decision Processes to parameterize the games. We validate our approach empirically, showing that our designed games can successfully distinguish among players with different traits, outperforming manually-designed ones by a large margin.

[1]  Yee Whye Teh,et al.  The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables , 2016, ICLR.

[2]  A. Tversky,et al.  Prospect theory: an analysis of decision under risk — Source link , 2007 .

[3]  Alex Imas,et al.  Experimental methods: Eliciting risk preferences , 2013 .

[4]  Luke Clark,et al.  Reward/Punishment reversal learning in older suicide attempters. , 2010, The American journal of psychiatry.

[5]  Lillian J. Ratliff,et al.  Inverse Risk-Sensitive Reinforcement Learning , 2017, IEEE Transactions on Automatic Control.

[6]  H. Jaap van den Herik,et al.  PsyOps: Personality assessment through gaming behavior , 2013, FDG.

[7]  E. Gumbel Statistical Theory of Extreme Values and Some Practical Applications : A Series of Lectures , 1954 .

[8]  R. Spitzer,et al.  The PHQ-9: validity of a brief depression severity measure. , 2001, Journal of general internal medicine.

[9]  Paolo Crosetto,et al.  A theoretical and experimental appraisal of four risk elicitation methods , 2013 .

[10]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[11]  Ronald A. Howard,et al.  Dynamic Programming and Markov Processes , 1960 .

[12]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[13]  Ben Poole,et al.  Categorical Reparameterization with Gumbel-Softmax , 2016, ICLR.

[14]  Julian Togelius,et al.  Towards generating arcade game rules with VGDL , 2015, 2015 IEEE Conference on Computational Intelligence and Games (CIG).

[15]  Melissa T. Buelow,et al.  Construct Validity of the Iowa Gambling Task , 2009, Neuropsychology Review.

[16]  Pieter Abbeel,et al.  InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets , 2016, NIPS.

[17]  J. Andrew Bagnell,et al.  Modeling Purposeful Adaptive Behavior with the Principle of Maximum Causal Entropy , 2010 .

[18]  D. Russell UCLA Loneliness Scale (Version 3): reliability, validity, and factor structure. , 1996, Journal of personality assessment.

[19]  T. Kamarck,et al.  A global measure of perceived stress. , 1983, Journal of health and social behavior.

[20]  Lillian J. Ratliff,et al.  Risk-Sensitive Inverse Reinforcement Learning via Gradient Methods , 2017, ArXiv.

[21]  Julian Togelius,et al.  Give me a reason to dig Minecraft and psychology of motivation , 2013, 2013 IEEE Conference on Computational Inteligence in Games (CIG).

[22]  A. Damasio,et al.  Insensitivity to future consequences following damage to human prefrontal cortex , 1994, Cognition.

[23]  Chu Kim-prieto,et al.  New Well-being Measures: Short Scales to Assess Flourishing and Positive and Negative Feelings , 2010 .

[24]  Michael X. Cohen,et al.  A Role for Dopamine in Temporal Decision Making and Reward Maximization in Parkinsonism , 2008, The Journal of Neuroscience.

[25]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[26]  David Barber,et al.  The IM algorithm: a variational approach to Information Maximization , 2003, NIPS 2003.

[27]  Les Nelson,et al.  Introverted elves & conscientious gnomes: the expression of personality in world of warcraft , 2011, CHI.

[28]  Sergey Levine,et al.  Reinforcement Learning and Control as Probabilistic Inference: Tutorial and Review , 2018, ArXiv.

[29]  Joseph T. McGuire,et al.  Decision makers calibrate behavioral persistence on the basis of time-interval experience , 2012, Cognition.