Direct Human-AI Comparison in the Animal-AI Environment

Artificial Intelligence is making rapid and remarkable progress in the development of more sophisticated and powerful systems. However, the acknowledgement of several problems with modern machine learning approaches has prompted a shift in AI benchmarking away from task-oriented testing (such as Chess and Go) towards ability-oriented testing, in which AI systems are tested on their capacity to solve certain kinds of novel problems. The Animal-AI Environment is one such benchmark which aims to apply the ability-oriented testing used in comparative psychology to AI systems. Here, we present the first direct human-AI comparison in the Animal-AI Environment, using children aged 6–10 (n = 52). We found that children of all ages were significantly better than a sample of 30 AIs across most of the tests we examined, as well as performing significantly better than the two top-scoring AIs, “ironbar” and “Trrrrr,” from the Animal-AI Olympics Competition 2019. While children and AIs performed similarly on basic navigational tasks, AIs performed significantly worse in more complex cognitive tests, including detour tasks, spatial elimination tasks, and object permanence tasks, indicating that AIs lack several cognitive abilities that children aged 6–10 possess. Both children and AIs performed poorly on tool-use tasks, suggesting that these tests are challenging for both biological and non-biological machines.

[1]  José Hernández-Orallo,et al.  Twenty Years Beyond the Turing Test: Moving Beyond the Human Judges Too , 2020, Minds and Machines.

[2]  Chaz Firestone,et al.  Performance vs. competence in human–machine comparisons , 2020, Proceedings of the National Academy of Sciences.

[3]  Antoine Coutrot,et al.  Computer models of saliency alone fail to predict subjective visual attention to landmarks during observed navigation , 2020, Spatial Cogn. Comput..

[4]  M. Shanahan,et al.  Artificial Intelligence and the Common Sense of Animals , 2020, Trends in Cognitive Sciences.

[5]  Jonas Kubilius,et al.  Integrative Benchmarking to Advance Neurally Mechanistic Models of Human Intelligence , 2020, Neuron.

[6]  Matthew Crosby,et al.  Building Thinking Machines by Solving Animal Cognition Tasks , 2020, Minds and Machines.

[7]  Anton van den Hengel,et al.  On the Value of Out-of-Distribution Testing: An Example of Goodhart's Law , 2020, NeurIPS.

[8]  Jessica B. Hamrick,et al.  Exploring Exploration: Comparing Children with RL Agents in Unified Environments , 2020, ArXiv.

[9]  Yaser Al-Onaizan,et al.  Words Aren’t Enough, Their Order Matters: On the Robustness of Grounding Visual Referring Expressions , 2020, ACL.

[10]  M. Bethge,et al.  Shortcut learning in deep neural networks , 2020, Nature Machine Intelligence.

[11]  George C. Linderman,et al.  UMAP does not preserve global structure any better than t-SNE when using the same initialization , 2019, bioRxiv.

[12]  Douglas Heaven,et al.  Why deep-learning AIs are so easy to fool , 2019, Nature.

[13]  Murray Shanahan,et al.  The Animal-AI Environment: Training and Testing Animal-Like Artificial Cognition , 2019, ArXiv.

[14]  Julian Togelius,et al.  Superstition in the Network: Deep Reinforcement Learning Plays Deceptive Games , 2019, AIIDE.

[15]  Noam Brown,et al.  Superhuman AI for multiplayer poker , 2019, Science.

[16]  José Hernández-Orallo,et al.  Gazing into Clever Hans machines , 2019, Nature Machine Intelligence.

[17]  Marwan Mattar,et al.  Unity: A General Platform for Intelligent Agents , 2018, ArXiv.

[18]  Jonas Kubilius,et al.  Brain-Score: Which Artificial Neural Network for Object Recognition is most Brain-Like? , 2018, bioRxiv.

[19]  Leland McInnes,et al.  UMAP: Uniform Manifold Approximation and Projection , 2018, J. Open Source Softw..

[20]  J. Tenenbaum Building Machines that Learn and Think Like People , 2018, AAMAS.

[21]  J. Langbein Motor self-regulation in goats (Capra aegagrus hircus) in a detour-reaching task , 2018, PeerJ.

[22]  G. Duncan,et al.  Revisiting the Marshmallow Test: A Conceptual Replication Investigating Links Between Early Delay of Gratification and Later Outcomes , 2018, Psychological science.

[23]  Alexei A. Efros,et al.  Investigating Human Priors for Playing Video Games , 2018, ICML.

[24]  Leland McInnes,et al.  UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction , 2018, ArXiv.

[25]  Demis Hassabis,et al.  Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm , 2017, ArXiv.

[26]  Dhruv Batra,et al.  Don't Just Assume; Look and Answer: Overcoming Priors for Visual Question Answering , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[27]  Jun Zhu,et al.  Boosting Adversarial Attacks with Momentum , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[28]  José Hernández-Orallo,et al.  Evaluation in artificial intelligence: from task-oriented to ability-oriented measurement , 2017, Artificial Intelligence Review.

[29]  Percy Liang,et al.  Adversarial Examples for Evaluating Reading Comprehension Systems , 2017, EMNLP.

[30]  D. Hassabis,et al.  Neuroscience-Inspired Artificial Intelligence , 2017, Neuron.

[31]  José Hernández-Orallo,et al.  The Measure of All Minds: Evaluating Natural and Artificial Intelligence , 2017 .

[32]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[33]  I. Pepperberg,et al.  Delayed gratification: A grey parrot (Psittacus erithacus) will wait for a better reward. , 2015, Journal of comparative psychology.

[34]  Mathias Osvath,et al.  The string-pulling paradigm in comparative psychology. , 2015, Journal of comparative psychology.

[35]  Bob L. Sturm A Simple Method to Determine if a Music Information Retrieval System is a “Horse” , 2014, IEEE Transactions on Multimedia.

[36]  Allison M. Barnard,et al.  The evolution of self-control , 2014, Proceedings of the National Academy of Sciences.

[37]  Alex H. Taylor,et al.  Using the Aesop's Fable Paradigm to Investigate Causal Understanding of Water Displacement by New Caledonian Crows , 2014, PloS one.

[38]  Irina Meketa A critique of the principle of cognitive simplicity in comparative cognition , 2014, Biology & Philosophy.

[39]  Daniël Lakens,et al.  Calculating and reporting effect sizes to facilitate cumulative science: a practical primer for t-tests and ANOVAs , 2013, Front. Psychol..

[40]  Erik T. Mueller,et al.  Watson: Beyond Jeopardy! , 2013, Artif. Intell..

[41]  Cameron Buckner Morgan’s Canon, meet Hume’s Dictum: avoiding anthropofabulation in cross-species comparisons , 2013 .

[42]  A. Dickinson Associative learning and animal cognition , 2012, Philosophical Transactions of the Royal Society B: Biological Sciences.

[43]  A. Tate A measure of intelligence , 2012 .

[44]  John J. Connolly,et al.  Age group and sex differences in performance on a computerized neurocognitive battery in children age 8-21. , 2012, Neuropsychology.

[45]  Giorgio Vallortigara,et al.  Intuitive physical reasoning about occluded objects by inexperienced chicks , 2011, Proceedings of the Royal Society B: Biological Sciences.

[46]  José Hernández-Orallo,et al.  Comparing Humans and AI Agents , 2011, AGI.

[47]  J. J. Higgins,et al.  The aligned rank transform for nonparametric factorial analyses using only anova procedures , 2011, CHI.

[48]  Jackie Chappell,et al.  Making tools isn’t child’s play , 2011, Cognition.

[49]  Nicola S. Clayton,et al.  Tool-use and instrumental learning in the Eurasian jay (Garrulus glandarius) , 2011, Animal Cognition.

[50]  Pedro M. Valero-Mora,et al.  ggplot2: Elegant Graphics for Data Analysis , 2010 .

[51]  Catherine Dehon,et al.  Influence functions of the Spearman and Kendall correlation measures , 2010, Stat. Methods Appl..

[52]  Alex H. Taylor,et al.  An Investigation into the Cognition Behind Spontaneous String Pulling in New Caledonian Crows , 2010, PloS one.

[53]  H. Radder,et al.  This is an open access article distributed under the terms of the Creative Commons Attribution License , 2009 .

[54]  A. Kacelnik,et al.  Cognitive Processes Associated with Sequential Tool Use in New Caledonian Crows , 2009, PloS one.

[55]  Alex H. Taylor,et al.  Spontaneous Metatool Use by New Caledonian Crows , 2007, Current Biology.

[56]  J. Qin,et al.  Maze exploration and learning in C. elegans. , 2007, Lab on a chip.

[57]  Kathleen M. Silva,et al.  Methodological-conceptual problems in the study of chimpanzees’ folk physics: How studies with adult humans can help , 2005, Learning & behavior.

[58]  J. Algina,et al.  Generalized eta and omega squared statistics: measures of effect size for some common research designs. , 2003, Psychological methods.

[59]  Roman Ferstl,et al.  Navigational place learning in children and young adults as assessed with a standardized locomotor search task. , 2003, British journal of psychology.

[60]  Onur Güntürkün,et al.  A bottlenose dolphin discriminates visual stimuli differing in numerosity , 2003, Learning & behavior.

[61]  Jeffrey Rushen,et al.  Dairy cattle’s choice of handling treatments in a Y-maze , 2003 .

[62]  D. Paindaveine,et al.  Optimal procedures based on interdirections and pseudo-Mahalanobis ranks for testing multivariate elliptic white noise against ARMA dependence , 2002 .

[63]  A. Kacelnik,et al.  Shaping of Hooks in New Caledonian Crows , 2002, Science.

[64]  D. Paindaveine,et al.  Optimal tests for multivariate location based on interdirections and pseudo-Mahalanobis ranks , 2002 .

[65]  Murray Campbell,et al.  Deep Blue , 2002, Artif. Intell..

[66]  M. Beran Maintenance of Self-Imposed Delay of Gratification by Four Chimpanzees (Pan troglodytes) and an Orangutan (Pongo pygmaeus) , 2002, The Journal of general psychology.

[67]  G. Dell’Omo,et al.  A large outdoor radial maze for comparative studies in birds and mammals , 2001, Neuroscience & Biobehavioral Reviews.

[68]  Howery,et al.  Effects of social facilitation for locating feeding sites by cattle in an eight-arm radial maze. , 2000, Applied animal behaviour science.

[69]  R. Hughes,et al.  Algorithmic behaviour and spatial memory are used by two intertidal fish species to solve the radial maze , 1999, Animal Behaviour.

[70]  Jerald D. Kralik,et al.  Problem solving and functional design features: experiments on cotton-top tamarins,Saguinus oedipus oedipus , 1999, Animal Behaviour.

[71]  A C Kamil,et al.  Performance of four seed-caching corvid species in the radial-arm maze analog. , 1994, Journal of comparative psychology.

[72]  A S Etienne,et al.  Golden Hamsters on the Eight-Arm Maze in Light and Darkness: The Role of Dead Reckoning , 1994, The Quarterly journal of experimental psychology. B, Comparative and physiological psychology.

[73]  Walter Mischel,et al.  Predicting adolescent cognitive and self-regulatory competencies from preschool delay of gratification: Identifying diagnostic conditions. , 1990 .

[74]  R. W. Richards,et al.  Characteristics of spatial memory in cattle , 1989 .

[75]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[76]  People,et al.  The Clever Hans Phenomenon: Communication With Horses, Whales, Apes, and People , 1981 .

[77]  M. Redshaw Cognitive development in human and gorilla infants , 1978 .

[78]  J. Castilla Responses of Asterias rubens to bivalve prey in a Y-maze , 1972 .

[79]  A. M. Turing,et al.  Computing Machinery and Intelligence , 1950, The Philosophy of Artificial Intelligence.

[80]  Murray Shanahan,et al.  The Animal-AI Testbed and Competition , 2019, NeurIPS.

[81]  Diana Adler,et al.  Using Multivariate Statistics , 2016 .

[82]  L. Regolin,et al.  Numerical discrimination by frogs (Bombina orientalis) , 2014, Animal Cognition.

[83]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[84]  Cameron Buckner,et al.  Two Approaches to the Distinction between Cognition and ‘Mere Association’ , 2011, International Journal of Comparative Psychology.

[85]  J. Call,et al.  Chimpanzees solve the trap problem when the confound of tool-use is removed. , 2009, Journal of experimental psychology. Animal behavior processes.

[86]  Cecilia Heyes,et al.  Rational or associative? Imitation in Japanese quail , 2006 .

[87]  R. Ryback Effect of ethanol, bourbon and various ethanol levels on Y-maze learning in the goldfish , 2005, Psychopharmacologia.

[88]  Holmes W. Finch Comparison of the Performance of Nonparametric and Parametric MANOVA Test Statistics when Assumptions Are Violated , 2005 .

[89]  J. C. Pollard,et al.  Responses of red deer to restraint in a y-maze preference test , 1994 .

[90]  Peter J. Rousseeuw,et al.  Finding Groups in Data: An Introduction to Cluster Analysis , 1990 .

[91]  W. Brown Animal Intelligence: Experimental Studies , 1912, Nature.