Assessing Human Error Against a Benchmark of Perfection

An increasing number of domains are providing us with detailed trace data on human decisions in settings where we can evaluate the quality of these decisions via an algorithm. Motivated by this development, an emerging line of work has begun to consider whether we can characterize and predict the kinds of decisions where people are likely to make errors. To investigate what a general framework for human error prediction might look like, we focus on a model system with a rich history in the behavioral sciences: the decisions made by chess players as they select moves in a game. We carry out our analysis at a large scale, employing datasets with several million recorded games, and using chess tablebases to acquire a form of ground truth for a subset of chess positions that have been completely solved by computers but remain challenging even for the best players in the world. We organize our analysis around three categories of features that we argue are present in most settings where the analysis of human error is applicable: the skill of the decision-maker, the time available to make the decision, and the inherent difficulty of the decision. We identify rich structure in all three of these categories of features, and find strong evidence that in our domain, features describing the inherent difficulty of an instance are significantly more powerful than features based on skill or time.

[1]  Tom Minka,et al.  TrueSkillTM: A Bayesian Skill Rating System , 2006, NIPS.

[2]  Kenneth W. Regan,et al.  Quantifying Depth and Complexity of Thinking and Knowledge , 2015, ICAART.

[3]  A. Elo The rating of chessplayers, past and present , 1978 .

[4]  P. McLeod,et al.  Inflexibility of experts—Reality or myth? Quantifying the Einstellung effect in chess masters , 2008, Cognitive Psychology.

[5]  Robert P. Abelson,et al.  A Variance Explanation Paradox: When a Little is a Lot , 1985 .

[6]  A. D. D. Groot Thought and Choice in Chess , 1978 .

[7]  H. Simon,et al.  Perception in chess , 1973 .

[8]  D. Kopec Advances in Man-Machine Play , 1990 .

[9]  Gavriel Salvendy,et al.  Handbook of Human Factors and Ergonomics: Salvendy/Handbook of Human Factors and Ergonomics , 2006 .

[10]  John McCarthy,et al.  Chess as the Drosophila of AI , 1990 .

[11]  Gavriel Salvendy,et al.  Handbook of Human Factors and Ergonomics , 2005 .

[12]  James L. McClelland,et al.  On learning the past-tenses of English verbs: implicit rules or parallel distributed processing , 1986 .

[13]  R. McKelvey,et al.  Quantal Response Equilibria for Extensive Form Games , 1998 .

[14]  Elizabeth C. Hirschman,et al.  Judgment under Uncertainty: Heuristics and Biases , 1974, Science.

[15]  Barry Kirwan,et al.  Human Reliability Assessment , 2008 .

[16]  P. J. Jansen Problematic Positions and Speculative Play , 1990 .

[17]  R. Brown,et al.  A First Language , 1973 .

[18]  Diane F Halpern,et al.  The world of competitive Scrabble: Novice and expert differences in visuopatial and verbal abilities. , 2007, Journal of experimental psychology. Applied.

[19]  H. Simon,et al.  Models Of Man : Social And Rational , 1957 .

[20]  H. Simon,et al.  Skill in Chess , 1988 .

[21]  Monty Newborn,et al.  Kasparov versus Deep Blue - computer chess comes of age , 1996 .

[22]  Kenneth W. Regan,et al.  Measuring Level-K Reasoning, Satisficing, and Human Error in Game-Play Data , 2015, 2015 IEEE 14th International Conference on Machine Learning and Applications (ICMLA).

[23]  V. A. Harris,et al.  The Attribution of Attitudes , 1967 .

[24]  N. Charness The impact of chess research on cognitive science , 1992 .

[25]  Laura L. Namy,et al.  The Changing Role of Iconicity in Non-Verbal Symbol Learning: A U-Shaped Trajectory in the Acquisition of Arbitrary Gestures , 2004 .

[26]  A. Tversky,et al.  Judgment under Uncertainty: Heuristics and Biases , 1974, Science.

[27]  Jure Leskovec,et al.  A Bayesian Framework for Modeling Human Evaluations , 2015, SDM.

[28]  Nicole M. McNeil U-shaped development in math: 7-year-olds outperform 9-year-olds on equivalence problems. , 2007, Developmental psychology.

[29]  John R. Anderson,et al.  Why do children learn to say “Broke”? A model of learning the past tense without feedback , 2002, Cognition.

[30]  Vincent J. Ferrari,et al.  L'expertise cognitive au jeu d'échecs: quoi de neuf depuis De Groot (1946)? , 2004 .

[31]  Jure Leskovec,et al.  Human Decisions and Machine Predictions , 2017, The quarterly journal of economics.

[32]  Kenneth W. Regan,et al.  Psychometric modeling of decision making via game play , 2013, 2013 IEEE Conference on Computational Inteligence in Games (CIG).

[33]  Gavriel Salvendy,et al.  Handbook of Human Factors and Ergonomics: Salvendy/Handbook of Human Factors 4e , 2012 .

[34]  R Bellman ON THE APPLICATION OF DYNAMIC PROGRAMING TO THE DETERMINATION OF OPTIMAL PLAY IN CHESS AND CHECKERS. , 1965, Proceedings of the National Academy of Sciences of the United States of America.