Assessing Human Error Against a Benchmark of Perfection

An increasing number of domains are providing us with detailed trace data on human decisions in settings where we can evaluate the quality of these decisions via an algorithm. Motivated by this development, an emerging line of work has begun to consider whether we can characterize and predict the kinds of decisions where people are likely to make errors. To investigate what a general framework for human error prediction might look like, we focus on a model system with a rich history in the behavioral sciences: the decisions made by chess players as they select moves in a game. We carry out our analysis at a large scale, employing datasets with several million recorded games, and using chess tablebases to acquire a form of ground truth for a subset of chess positions that have been completely solved by computers but remain challenging for even the best players in the world. We organize our analysis around three categories of features that we argue are present in most settings where the analysis of human error is applicable: the skill of the decision-maker, the time available to make the decision, and the inherent difficulty of the decision. We identify rich structure in all three of these categories of features, and find strong evidence that in our domain, features describing the inherent difficulty of an instance are significantly more powerful than features based on skill or time.

[1]  Tom Minka,et al.  TrueSkillTM: A Bayesian Skill Rating System , 2006, NIPS.

[2]  P. McLeod,et al.  Inflexibility of experts—Reality or myth? Quantifying the Einstellung effect in chess masters , 2008, Cognitive Psychology.

[3]  A. D. D. Groot Thought and Choice in Chess , 1978 .

[4]  R. McKelvey,et al.  Quantal Response Equilibria for Extensive Form Games , 1998 .

[5]  Barry Kirwan,et al.  Human Reliability Assessment , 2008 .

[6]  A. Tversky,et al.  Judgment under Uncertainty: Heuristics and Biases , 1974, Science.

[7]  Jure Leskovec,et al.  Human Decisions and Machine Predictions , 2017, The quarterly journal of economics.

[8]  James L. McClelland,et al.  On learning the past-tenses of English verbs: implicit rules or parallel distributed processing , 1986 .

[9]  R Bellman ON THE APPLICATION OF DYNAMIC PROGRAMING TO THE DETERMINATION OF OPTIMAL PLAY IN CHESS AND CHECKERS. , 1965, Proceedings of the National Academy of Sciences of the United States of America.

[10]  John McCarthy,et al.  Chess as the Drosophila of AI , 1990 .

[11]  Gavriel Salvendy,et al.  Handbook of Human Factors and Ergonomics , 2005 .

[12]  John R. Anderson,et al.  Why do children learn to say “Broke”? A model of learning the past tense without feedback , 2002, Cognition.

[13]  A. Elo The rating of chessplayers, past and present , 1978 .

[14]  H. Simon,et al.  Models Of Man : Social And Rational , 1957 .

[15]  S. Griffis EDITOR , 1997, Journal of Navigation.

[16]  P. J. Jansen Problematic Positions and Speculative Play , 1990 .

[17]  Elizabeth C. Hirschman,et al.  Judgment under Uncertainty: Heuristics and Biases , 1974, Science.

[18]  N. Charness The impact of chess research on cognitive science , 1992 .

[19]  Vincent J. Ferrari,et al.  L'expertise cognitive au jeu d'échecs: quoi de neuf depuis De Groot (1946)? , 2004 .

[20]  Nicole M. McNeil U-shaped development in math: 7-year-olds outperform 9-year-olds on equivalence problems. , 2007, Developmental psychology.

[21]  References , 1971 .

[22]  J. Shaoul Human Error , 1973, Nature.

[23]  Kenneth W. Regan,et al.  Measuring Level-K Reasoning, Satisficing, and Human Error in Game-Play Data , 2015, 2015 IEEE 14th International Conference on Machine Learning and Applications (ICMLA).

[24]  Kenneth W. Regan,et al.  Quantifying Depth and Complexity of Thinking and Knowledge , 2015, ICAART.

[25]  R. Brown,et al.  A First Language , 1973 .

[26]  Robert P. Abelson,et al.  A Variance Explanation Paradox : When a Little is a Lot , 1985 .

[27]  V. A. Harris,et al.  The Attribution of Attitudes , 1967 .

[28]  Laura L. Namy,et al.  The Changing Role of Iconicity in Non-Verbal Symbol Learning: A U-Shaped Trajectory in the Acquisition of Arbitrary Gestures , 2004 .

[29]  Monty Newborn,et al.  Kasparov versus Deep Blue - computer chess comes of age , 1996 .

[30]  H. Simon,et al.  Perception in chess , 1973 .

[31]  Diane F Halpern,et al.  The world of competitive Scrabble: Novice and expert differences in visuopatial and verbal abilities. , 2007, Journal of experimental psychology. Applied.

[32]  Jure Leskovec,et al.  A Bayesian Framework for Modeling Human Evaluations , 2015, SDM.

[33]  Kenneth W. Regan,et al.  Psychometric modeling of decision making via game play , 2013, 2013 IEEE Conference on Computational Inteligence in Games (CIG).

[34]  H. Simon,et al.  Skill in Chess , 1988 .

[35]  Thomas Hofmann,et al.  TrueSkill™: A Bayesian Skill Rating System , 2007 .

[36]  D. Kopec Advances in Man-Machine Play , 1990 .