A Dynamic Intelligence Test Framework for Evaluating AI Agents

In our recent work on the measurement of (collective) intelligence, we used a dynamic intelligence test to measure and compare the performances of artificial agents. In this paper we give a detailed technical description of the testing framework, its design and implementation, showing how it can be used to quantitatively evaluate general purpose, singleand multi-agent artificial intelligence (AI). The source code and scripts to run experiments have been released as open-source, and instructions on how to administer the test to artificial agents have been outlined. This will allow evaluating new agent behaviours and also extending the scope of the test. Alternative testing environments are discussed along with other considerations relevant to the robustness of multi-agent performance tests. The intuition is to encourage people in the AI community to quantitatively evaluate new types of heuristics and algorithms individually and collectively using different communication and interaction protocols, and thus pave the way towards a rigorous, formal and unified testing framework for general purpose agents.

[1]  R. Solomonoff A PRELIMINARY REPORT ON A GENERAL THEORY OF INDUCTIVE INFERENCE , 2001 .

[2]  Erez Karpas,et al.  Sensible Agent Technology Improving Coordination and Communication in Biosurveillance Domains , 2009, IJCAI.

[3]  José Hernández-Orallo,et al.  Comparing Humans and AI Agents , 2011, AGI.

[4]  L DoweDavid,et al.  Measuring Universal Intelligence in Agent-Based Systems Using the Anytime Intelligence Test , 2016 .

[5]  David G. Green,et al.  Observation, Communication and Intelligence in Agent-Based Systems , 2015, AGI.

[6]  Ray J. Solomonoff,et al.  A Formal Theory of Inductive Inference. Part II , 1964, Inf. Control..

[7]  Lawrence F. Gray,et al.  A Mathematician Looks at Wolfram''s New Kind of Science , 2003 .

[8]  G. Chaitin Gödel's theorem and information , 1982 .

[9]  Nuttapong Chentanez,et al.  Intrinsically Motivated Reinforcement Learning , 2004, NIPS.

[10]  Nancy M. Amato,et al.  FIRM: Sampling-based feedback motion-planning under motion uncertainty and imperfect measurements , 2014, Int. J. Robotics Res..

[11]  Jürgen Schmidhuber,et al.  Formal Theory of Creativity, Fun, and Intrinsic Motivation (1990–2010) , 2010, IEEE Transactions on Autonomous Mental Development.

[12]  Risto Miikkulainen,et al.  A Neuroevolution Approach to General Atari Game Playing , 2014, IEEE Transactions on Computational Intelligence and AI in Games.

[13]  Ray J. Solomonoff,et al.  A Formal Theory of Inductive Inference. Part I , 1964, Inf. Control..

[14]  Benjamin Van Roy,et al.  Deep Exploration via Bootstrapped DQN , 2016, NIPS.

[15]  David L. Dowe,et al.  A computational extension to the Turing test , 1997 .

[16]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[17]  Richard S. Sutton,et al.  Planning by Prioritized Sweeping with Small Backups , 2013, ICML.

[18]  Mahesan Niranjan,et al.  On-line Q-learning using connectionist systems , 1994 .

[19]  C. S. Wallace,et al.  An Information Measure for Classification , 1968, Comput. J..

[20]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[21]  Shane Legg,et al.  Universal Intelligence: A Definition of Machine Intelligence , 2007, Minds and Machines.

[22]  George M. Bodner,et al.  MODELS AND MODELING , 2005 .

[23]  Gregory J. Chaitin,et al.  On the Length of Programs for Computing Finite Binary Sequences: statistical considerations , 1969, JACM.

[24]  G. Hardin,et al.  The Tragedy of the Commons , 1968, Green Planet Blues.

[25]  Abraham Lempel,et al.  On the Complexity of Finite Sequences , 1976, IEEE Trans. Inf. Theory.

[26]  A. Kolmogorov Three approaches to the quantitative definition of information , 1968 .

[27]  Risto Miikkulainen,et al.  HyperNEAT-GGP: a hyperNEAT-based atari general game player , 2012, GECCO '12.

[28]  J ChaitinGregory On the Length of Programs for Computing Finite Binary Sequences , 1966 .

[29]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  David G. Green,et al.  Factors of Collective Intelligence: How Smart Are Agent Collectives? , 2016, ECAI.

[31]  David L. Dowe,et al.  A Non-Behavioural, Computational Extension to the Turing Test , 1998 .

[32]  José Hernández-Orallo,et al.  Beyond the Turing Test , 2000, J. Log. Lang. Inf..

[33]  Jason Weston,et al.  Curriculum learning , 2009, ICML '09.

[34]  Shane Legg,et al.  Massively Parallel Methods for Deep Reinforcement Learning , 2015, ArXiv.

[35]  Mark O. Riedl The Lovelace 2.0 Test of Artificial Creativity and Intelligence , 2014, ArXiv.

[36]  David Silver,et al.  Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.

[37]  Long-Ji Lin,et al.  Reinforcement learning for robots using neural networks , 1992 .

[38]  Pierre-Yves Oudeyer,et al.  Intrinsically Motivated Learning of Real-World Sensorimotor Skills with Developmental Constraints , 2013, Intrinsically Motivated Learning in Natural and Artificial Systems.

[39]  Pei Wang,et al.  THE ASSUMPTIONS ON KNOWLEDGE AND RESOURCES IN MODELS OF RATIONALITY , 2011 .

[40]  José Hernández-Orallo,et al.  Compression and Intelligence: Social Environments and Communication , 2011, AGI.

[41]  P.-P. Grasse La reconstruction du nid et les coordinations interindividuelles chezBellicositermes natalensis etCubitermes sp. la théorie de la stigmergie: Essai d'interprétation du comportement des termites constructeurs , 1959, Insectes Sociaux.

[42]  David L. Dowe,et al.  Minimum Message Length and Kolmogorov Complexity , 1999, Comput. J..

[43]  José Hernández-Orallo,et al.  Measuring universal intelligence: Towards an anytime intelligence test , 2010, Artif. Intell..

[44]  Paul M. B. Vitányi,et al.  An Introduction to Kolmogorov Complexity and Its Applications , 1993, Graduate Texts in Computer Science.

[45]  José Hernández-Orallo,et al.  The Measure of All Minds: Evaluating Natural and Artificial Intelligence , 2017 .

[46]  Peter Grünwald,et al.  Invited review of the book Statistical and Inductive Inference by Minimum Message Length , 2006 .

[47]  G. Saulnier Kolmogorov Complexity Estimation and Analysis , 2002 .

[48]  Jean-Daniel Zucker,et al.  Abstraction in Artificial Intelligence and Complex Systems , 2013, Springer New York.

[49]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[50]  Marc G. Bellemare,et al.  The Arcade Learning Environment: An Evaluation Platform for General Agents , 2012, J. Artif. Intell. Res..

[51]  B. Jack Copeland,et al.  The Turing Test* , 2000, Minds and Machines.

[52]  José Hernández-Orallo A (hopefully) Unbiased Universal Environment Class for Measuring Intelligence of Biological and Artificial Systems , 2009, AGI 2010.

[53]  Murray Campbell,et al.  Deep Blue , 2002, Artif. Intell..

[54]  David G. Green,et al.  Pluralistic ignorance: Emergence and hypotheses testing in a multi-agent system , 2016, 2016 International Joint Conference on Neural Networks (IJCNN).

[55]  David L. Dowe,et al.  MML, hybrid Bayesian network graphical models, statistical consistency, invarianc , 2010 .

[56]  Marc G. Bellemare,et al.  Investigating Contingency Awareness Using Atari 2600 Games , 2012, AAAI.

[57]  Kristinn R. Thórisson,et al.  Achieving Artificial General Intelligence Through Peewee Granularity , 2009 .

[58]  Z. Zenn Bien,et al.  Machine intelligence quotient: its measurements and applications , 2002, Fuzzy Sets Syst..

[59]  Rodney A. Brooks,et al.  Intelligence Without Reason , 1991, IJCAI.

[60]  José Hernández-Orallo,et al.  On Potential Cognitive Abilities in the Machine Kingdom , 2013, Minds and Machines.

[61]  Chong Wang,et al.  Deep Speech 2 : End-to-End Speech Recognition in English and Mandarin , 2015, ICML.

[62]  José Hernández-Orallo,et al.  Computer models solving intelligence test problems: Progress and implications , 2016, Artif. Intell..

[63]  A. M. Turing,et al.  Computing Machinery and Intelligence , 1950, The Philosophy of Artificial Intelligence.

[64]  David L. Dowe,et al.  A computer program capable of passing I.Q. tests , 2008 .

[65]  Stephan Schiffel,et al.  Towards Flexible Task Environments for Comprehensive Evaluation of Artificial Intelligent Systems and Automatic Learners , 2015, AGI.

[66]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[67]  José Hernández-Orallo,et al.  On Measuring Social Intelligence: Experiments on Competition and Cooperation , 2012, AGI.

[68]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).