论文信息 - Assessing Human Error Against a Benchmark of Perfection - 字舞流文

Assessing Human Error Against a Benchmark of Perfection

An increasing number of domains are providing us with detailed trace data on human decisions in settings where we can evaluate the quality of these decisions via an algorithm. Motivated by this development, an emerging line of work has begun to consider whether we can characterize and predict the kinds of decisions where people are likely to make errors. To investigate what a general framework for human error prediction might look like, we focus on a model system with a rich history in the behavioral sciences: the decisions made by chess players as they select moves in a game. We carry out our analysis at a large scale, employing datasets with several million recorded games, and using chess tablebases to acquire a form of ground truth for a subset of chess positions that have been completely solved by computers but remain challenging for even the best players in the world. We organize our analysis around three categories of features that we argue are present in most settings where the analysis of human error is applicable: the skill of the decision-maker, the time available to make the decision, and the inherent difficulty of the decision. We identify rich structure in all three of these categories of features, and find strong evidence that in our domain, features describing the inherent difficulty of an instance are significantly more powerful than features based on skill or time.

Jon M. Kleinberg | Sendhil Mullainathan | Ashton Anderson | J. Kleinberg | S. Mullainathan | Ashton Anderson

[1] Tom Minka,et al. TrueSkillTM: A Bayesian Skill Rating System , 2006, NIPS.

[2] P. McLeod,et al. Inflexibility of experts—Reality or myth? Quantifying the Einstellung effect in chess masters , 2008, Cognitive Psychology.

[3] A. D. D. Groot. Thought and Choice in Chess , 1978 .

[4] R. McKelvey,et al. Quantal Response Equilibria for Extensive Form Games , 1998 .

[5] Barry Kirwan,et al. Human Reliability Assessment , 2008 .

[6] A. Tversky,et al. Judgment under Uncertainty: Heuristics and Biases , 1974, Science.

[7] Jure Leskovec,et al. Human Decisions and Machine Predictions , 2017, The quarterly journal of economics.

[8] James L. McClelland,et al. On learning the past-tenses of English verbs: implicit rules or parallel distributed processing , 1986 .

[9] R Bellman. ON THE APPLICATION OF DYNAMIC PROGRAMING TO THE DETERMINATION OF OPTIMAL PLAY IN CHESS AND CHECKERS. , 1965, Proceedings of the National Academy of Sciences of the United States of America.

[10] John McCarthy,et al. Chess as the Drosophila of AI , 1990 .

[11] Gavriel Salvendy,et al. Handbook of Human Factors and Ergonomics , 2005 .

[12] John R. Anderson,et al. Why do children learn to say “Broke”? A model of learning the past tense without feedback , 2002, Cognition.

[13] A. Elo. The rating of chessplayers, past and present , 1978 .

[14] H. Simon,et al. Models Of Man : Social And Rational , 1957 .

[15] S. Griffis. EDITOR , 1997, Journal of Navigation.

[16] P. J. Jansen. Problematic Positions and Speculative Play , 1990 .

[17] Elizabeth C. Hirschman,et al. Judgment under Uncertainty: Heuristics and Biases , 1974, Science.

[18] N. Charness. The impact of chess research on cognitive science , 1992 .

[19] Vincent J. Ferrari,et al. L'expertise cognitive au jeu d'échecs: quoi de neuf depuis De Groot (1946)? , 2004 .

[20] Nicole M. McNeil. U-shaped development in math: 7-year-olds outperform 9-year-olds on equivalence problems. , 2007, Developmental psychology.

[21] References , 1971 .

[22] J. Shaoul. Human Error , 1973, Nature.

[23] Kenneth W. Regan,et al. Measuring Level-K Reasoning, Satisficing, and Human Error in Game-Play Data , 2015, 2015 IEEE 14th International Conference on Machine Learning and Applications (ICMLA).

[24] Kenneth W. Regan,et al. Quantifying Depth and Complexity of Thinking and Knowledge , 2015, ICAART.

[25] R. Brown,et al. A First Language , 1973 .

[26] Robert P. Abelson,et al. A Variance Explanation Paradox : When a Little is a Lot , 1985 .

[27] V. A. Harris,et al. The Attribution of Attitudes , 1967 .

[28] Laura L. Namy,et al. The Changing Role of Iconicity in Non-Verbal Symbol Learning: A U-Shaped Trajectory in the Acquisition of Arbitrary Gestures , 2004 .

[29] Monty Newborn,et al. Kasparov versus Deep Blue - computer chess comes of age , 1996 .

[30] H. Simon,et al. Perception in chess , 1973 .

[31] Diane F Halpern,et al. The world of competitive Scrabble: Novice and expert differences in visuopatial and verbal abilities. , 2007, Journal of experimental psychology. Applied.

[32] Jure Leskovec,et al. A Bayesian Framework for Modeling Human Evaluations , 2015, SDM.

[33] Kenneth W. Regan,et al. Psychometric modeling of decision making via game play , 2013, 2013 IEEE Conference on Computational Inteligence in Games (CIG).

[34] H. Simon,et al. Skill in Chess , 1988 .

[35] Thomas Hofmann,et al. TrueSkill™: A Bayesian Skill Rating System , 2007 .

[36] D. Kopec. Advances in Man-Machine Play , 1990 .