Learning and Multiagent Reasoning for Autonomous Agents

One goal of Artificial Intelligence is to enable the creation of robust, fully autonomous agents that can coexist with us in the real world. Such agents will need to be able to learn, both in order to correct and circumvent their inevitable imperfections, and to keep up with a dynamically changing world. They will also need to be able to interact with one another, whether they share common goals, they pursue independent goals, or their goals are in direct conflict. This paper presents current research directions in machine learning, multiagent reasoning, and robotics, and advocates their unification within concrete application domains. Ideally, new theoretical results in each separate area will inform practical implementations while innovations from concrete multiagent applications will drive new theoretical pursuits, and together these synergistic research approaches will lead us towards the goal of fully autonomous agents.

[1]  Official Hansard BRISBANE , 1917 .

[2]  Manuela Veloso,et al.  An Empirical Study of Coaching , 2002, DARS.

[3]  Peter Stone,et al.  State Abstraction Discovery from Irrelevant State Variables , 2005, IJCAI.

[4]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[5]  Peter Stone,et al.  Adaptive mechanism design: a metalearning approach , 2006, ICEC '06.

[6]  Masahiro Fujita,et al.  Evolving robust gaits with AIBO , 2000, Proceedings 2000 ICRA. Millennium Conference. IEEE International Conference on Robotics and Automation. Symposia Proceedings (Cat. No.00CH37065).

[7]  L. Darrell Whitley,et al.  Adding Learning to the Cellular Development of Neural Networks: Evolution and the Baldwin Effect , 1993, Evolutionary Computation.

[8]  Vishal Soni,et al.  Using Homomorphisms to Transfer Options across Continuous Reinforcement Learning Domains , 2006, AAAI.

[9]  S. Shankar Sastry,et al.  Autonomous Helicopter Flight via Reinforcement Learning , 2003, NIPS.

[10]  Peter Stone,et al.  The UT Austin Villa 2003 Champion Simulator Coach: A Machine Learning Approach , 2004, RoboCup.

[11]  Thomas G. Dietterich Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..

[12]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[13]  Michail G. Lagoudakis,et al.  Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..

[14]  Manuela M. Veloso,et al.  Multiagent Systems: A Survey from a Machine Learning Perspective , 2000, Auton. Robots.

[15]  Andreas Birk,et al.  RoboCup 2001: Robot Soccer World Cup V , 2002, Lecture Notes in Computer Science.

[16]  Margo I. Seltzer,et al.  File classification in self-* storage systems , 2004 .

[17]  Leemon C. Baird,et al.  Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.

[18]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[19]  Jean-Arcady Meyer,et al.  Adaptive Behavior , 2005 .

[20]  Craig W. Reynolds Steering Behaviors For Autonomous Characters , 1999 .

[21]  Andrew G. Barto,et al.  Autonomous shaping: knowledge transfer in reinforcement learning , 2006, ICML.

[22]  Andrew G. Barto,et al.  Improving Elevator Performance Using Reinforcement Learning , 1995, NIPS.

[23]  Manuela M. Veloso,et al.  Recognizing Probabilistic Opponent Movement Models , 2001, RoboCup.

[24]  David H. Ackley,et al.  Interactions between learning and evolution , 1991 .

[25]  Keoni Mahelona,et al.  DARPA Grand Challenge , 2007 .

[26]  Risto Miikkulainen,et al.  Evolving Neural Networks through Augmenting Topologies , 2002, Evolutionary Computation.

[27]  Ieee Robotics,et al.  IEEE journal of robotics and automation , 1985 .

[28]  Yuval Davidor,et al.  Genetic Algorithms and Robotics - A Heuristic Strategy for Optimization , 1991, World Scientific Series in Robotics and Intelligent Systems.

[29]  Risto Miikkulainen,et al.  A neuro-evolution method for dynamic resource allocation on a chip multiprocessor , 2001, IJCNN'01. International Joint Conference on Neural Networks. Proceedings (Cat. No.01CH37222).

[30]  Gert Kootstra,et al.  International Conference on Robotics and Automation (ICRA) , 2008, ICRA 2008.

[31]  G. Reeke The society of mind , 1991 .

[32]  Peter Stone,et al.  Real-time vision on a mobile robot platform , 2005, 2005 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[33]  Manuela M. Veloso,et al.  On Behavior Classification in Adversarial Environments , 2000, DARS.

[34]  William T. B. Uther,et al.  Automatic Gait Optimisation for Quadruped Robots , 2003 .

[35]  Risto Miikkulainen,et al.  Automatic feature selection in neuroevolution , 2005, GECCO '05.

[36]  Satinder Singh Transfer of learning by composing solutions of elemental sequential tasks , 2004, Machine Learning.

[37]  Lynne E. Parker,et al.  Distributed Algorithms for Multi-Robot Observation of Multiple Moving Targets , 2002, Auton. Robots.

[38]  Murray Campbell,et al.  Deep Blue , 2002, Artif. Intell..

[39]  Brett Browning,et al.  RoboCup 2003: Robot Soccer World Cup VII , 2003, Lecture Notes in Computer Science.

[40]  Jeff G. Schneider,et al.  Autonomous helicopter control using reinforcement learning policy search methods , 2001, Proceedings 2001 ICRA. IEEE International Conference on Robotics and Automation (Cat. No.01CH37164).

[41]  Alan K. Mackworth On Seeing Robots , 1993, Computer Vision: Systems, Theory and Applications.

[42]  Nils J. Nilsson,et al.  Artificial Intelligence , 1974, IFIP Congress.

[43]  Ida G. Sprinkhuizen-Kuyper,et al.  Evolving Artificial Neural Networks using the "Baldwin Effect" † , 1995 .

[44]  Dean A. Pomerleau,et al.  Neural Network Perception for Mobile Robot Guidance , 1993 .

[45]  David C. Parkes,et al.  Iterative combinatorial auctions: achieving economic and computational efficiency , 2001 .

[46]  Jude W. Shavlik,et al.  Giving Advice about Preferred Actions to Reinforcement Learners Via Knowledge-Based Kernel Regression , 2005, AAAI.

[47]  Andrew Y. Ng,et al.  Pharmacokinetics of a novel formulation of ivermectin after administration to goats , 2000, ICML.

[48]  Milind Tambe,et al.  Tracking Dynamic Team Activity , 1996, AAAI/IAAI, Vol. 1.

[49]  Edmund H. Durfee,et al.  Recursive Agent Modeling Using Limited Rationality , 1995, ICMAS.

[50]  P. Cramton The FCC Spectrum Auctions: An Early Assessment , 1997 .

[51]  Andrew W. Moore,et al.  Gradient Descent for General Reinforcement Learning , 1998, NIPS.

[52]  James Newsome,et al.  Polygraph: automatically generating signatures for polymorphic worms , 2005, 2005 IEEE Symposium on Security and Privacy (S&P'05).

[53]  Peter Stone,et al.  Autonomous Color Learning on a Mobile Robot , 2005, AAAI.

[54]  Raúl Rojas,et al.  RoboCup 2002: Robot Soccer World Cup VI , 2002, Lecture Notes in Computer Science.

[55]  Geoffrey E. Hinton,et al.  How Learning Can Guide Evolution , 1996, Complex Syst..

[56]  David A. McAllester,et al.  Decision-Theoretic Bidding Based on Learned Density Models in Simultaneous, Interacting Auctions , 2003, J. Artif. Intell. Res..

[57]  Michael R. Genesereth,et al.  General Game Playing: Overview of the AAAI Competition , 2005, AI Mag..

[58]  Gregory S. Hornby,et al.  Autonomous evolution of gaits with the Sony Quadruped Robot , 1999 .

[59]  Peter Stone,et al.  Policy gradient reinforcement learning for fast quadrupedal locomotion , 2004, IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA '04. 2004.

[60]  Bikramjit Banerjee,et al.  General Game Learning Using Knowledge Transfer , 2007, IJCAI.

[61]  Jan Hoffmann,et al.  Reliable and Precise Gait Modeling for a Quadruped Robot , 2005, RoboCup.

[62]  A. H. Bond,et al.  An Analysis of Problems and Research in DAI , 1988 .

[63]  Joseph F. Murray,et al.  Machine Learning Methods for Predicting Failures in Hard Drives: A Multiple-Instance Application , 2005, J. Mach. Learn. Res..

[64]  Peter Stone,et al.  ATTac-2000: an adaptive autonomous bidding agent , 2001, AGENTS '01.

[65]  Andrew B. Kahng,et al.  Cooperative Mobile Robotics: Antecedents and Directions , 1997, Auton. Robots.

[66]  Peter Stone,et al.  Multiagent traffic management: a reservation-based intersection control mechanism , 2004, Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, 2004. AAMAS 2004..

[67]  Michael Dahlin,et al.  Towards Self-Configuring Hardware for Distributed Computer Systems , 2005, Second International Conference on Autonomic Computing (ICAC'05).

[68]  R. Weber Making More from Less: Strategic Demand Reduction in the FCC Spectrum Auctions , 1997 .

[69]  Edmund H. Durfee,et al.  Deciding When to Commit to Action During Observation-Based Coordination , 1995, ICMAS.

[70]  Wolfram Burgard,et al.  Monte Carlo localization for mobile robots , 1999, Proceedings 1999 IEEE International Conference on Robotics and Automation (Cat. No.99CH36288C).

[71]  G. D. Peng,et al.  UNIVERSITY OF NEW SOUTH WALES , 1962 .

[72]  Erann Gat On the Role of Simulation in the Study of Autonomous Mobile Robots , 2002 .

[73]  Rodney A. Brooks,et al.  Intelligence Without Reason , 1991, IJCAI.

[74]  D. L. Corgan,et al.  King's College , 1867, British medical journal.

[75]  Jeffrey O. Kephart,et al.  The Vision of Autonomic Computing , 2003, Computer.

[76]  Richard S. Sutton,et al.  Training and Tracking in Robotics , 1985, IJCAI.

[77]  Peter Stone,et al.  Towards autonomous sensor and actuator model induction on a mobile robot , 2006, Connect. Sci..

[78]  Stephan K. Chalup,et al.  The 2005 NUbots Team Report , 2006 .

[79]  Gita Sukthankar,et al.  Automatic Recognition of Human Team Behaviors , 2005 .

[80]  Tom M. Mitchell,et al.  Learning and Problem Solving , 1983, IJCAI.

[81]  Keith S. Decker,et al.  Distributed problem-solving techniques: A survey , 1987, IEEE Transactions on Systems, Man, and Cybernetics.

[82]  Richard Bishop,et al.  Intelligent Vehicle Technology and Trends , 2005 .

[83]  Peter Stone,et al.  Know Thine Enemy: A Champion RoboCup Coach Agent , 2006, AAAI.

[84]  Peter Stone,et al.  Automatic Heuristic Construction in a Complete General Game Player , 2006, AAAI.

[85]  Martine De Cock,et al.  Applied Artificial Intelligence , 2006 .

[86]  Peter Stone,et al.  TacTex-05: A Champion Supply Chain Management Agent , 2006, AAAI.

[87]  Hiroaki Kitano,et al.  RoboCup-99: Robot Soccer World Cup III , 2003, Lecture Notes in Computer Science.

[88]  Rodney A. Brooks,et al.  A Robust Layered Control Syste For A Mobile Robot , 2022 .

[89]  Gerald Tesauro,et al.  TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play , 1994, Neural Computation.

[90]  Chris Watkins,et al.  Learning from delayed rewards , 1989 .

[91]  S. A. Stoeter,et al.  Proceedings - IEEE International Conference on Robotics and Automation , 2003 .

[92]  D. E. Goldberg,et al.  Genetic Algorithms in Search , 1989 .

[93]  R. French,et al.  Genes, Phenes and the Baldwin Effect: Learning and Evolution in a Simulated Population , 1994 .

[94]  Peter Stone,et al.  RoboCup 2000: Robot Soccer World Cup IV , 2001, RoboCup.

[95]  Peter Stone,et al.  Color Learning on a Mobile Robot: Towards Full Autonomy under Changing Illumination , 2007, IJCAI.

[96]  Sebastian Thrun,et al.  Finding Structure in Reinforcement Learning , 1994, NIPS.

[97]  Michael I. Jordan,et al.  Scalable statistical bug isolation , 2005, PLDI '05.

[98]  Hiroaki Kitano,et al.  RoboCup: A Challenge Problem for AI , 1997, AI Mag..

[99]  Paul Vernaza,et al.  The University of Pennsylvania Robocup 2004 Legged Soccer Team , 2005 .

[100]  Michael I. Jordan,et al.  Failure diagnosis using decision trees , 2004 .

[101]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[102]  Martin A. Riedmiller,et al.  RoboCup 2004: Robot Soccer World Cup VIII , 2005, RoboCup.

[103]  J. Davenport Editor , 1960 .

[104]  Gerhard Weiß Distributed Artificial Intelligence Meets Machine Learning Learning in Multi-Agent Environments , 1997, Lecture Notes in Computer Science.

[105]  Marcus A. Maloof,et al.  Learning to detect malicious executables in the wild , 2004, KDD.

[106]  Peter Stone,et al.  Towards Illumination Invariance in the Legged League , 2005, RoboCup.

[107]  Leslie Pack Kaelbling,et al.  Mobilized ad-hoc networks: a reinforcement learning approach , 2004, International Conference on Autonomic Computing, 2004. Proceedings..

[108]  Jonathan Schaeffer,et al.  A World Championship Caliber Checkers Program , 1992, Artif. Intell..

[109]  Marco Aiello,et al.  20TH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE , 2007, IJCAI 2007.

[110]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[111]  Manuela M. Veloso,et al.  An evolutionary approach to gait learning for four-legged robots , 2004, 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No.04CH37566).

[112]  Michael I. Jordan,et al.  Bug isolation via remote program sampling , 2003, PLDI '03.

[113]  E. Gat On Three-Layer Architectures , 1997 .

[114]  Jussi Suomela,et al.  Positioning an autonomous off-road vehicle by using fused DGPS and inertial navigation , 1996, Int. J. Syst. Sci..

[115]  Peter Stone,et al.  Reinforcement Learning for RoboCup Soccer Keepaway , 2005, Adapt. Behav..

[116]  Hiroaki Kitano,et al.  The RoboCup Synthetic Agent Challenge 97 , 1997, IJCAI.

[117]  Mike Mannion,et al.  Complex systems , 1997, Proceedings International Conference and Workshop on Engineering of Computer-Based Systems.

[118]  Balaraman Ravindran,et al.  SMDP Homomorphisms: An Algebraic Approach to Abstraction in Semi-Markov Decision Processes , 2003, IJCAI.

[119]  Adam Jacoff,et al.  RoboCup 2005: Robot Soccer World Cup IX , 2006, RoboCup.

[120]  Pat Langley,et al.  An adaptive interactive agent for route advice , 1999, AGENTS '99.

[121]  Shou-De Lin,et al.  A trading agent competition , 2000 .

[122]  Larry D. Pyeatt,et al.  A comparison between cellular encoding and direct encoding for genetic neural networks , 1996 .

[123]  Thomas Röfer,et al.  Evolutionary Gait-Optimization Using a Fitness Function Based on Proprioception , 2004, RoboCup.

[124]  Manuela M. Veloso,et al.  Layered Approach to Learning Client Behaviors in the Robocup Soccer Server , 1998, Appl. Artif. Intell..

[125]  Stephan K. Chalup,et al.  Techniques for Improving Vision and Locomotion on the Sony AIBO Robot , 2003 .

[126]  David A. Patterson,et al.  Combining statistical monitoring and predictable recovery for self-management , 2004, WOSS '04.

[127]  Babak Falsafi,et al.  Dynamic feature selection for hardware prediction , 2006, J. Syst. Archit..

[128]  George A. Bekey,et al.  On autonomous robots , 1998, The Knowledge Engineering Review.

[129]  Richard S. Sutton,et al.  Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[130]  Jeffrey L. Elman,et al.  Learning and Evolution in Neural Networks , 1994, Adapt. Behav..

[131]  Thomas Sandholm,et al.  Making Markets and Democracy Work: A Story of Incentives and Computing , 2003, IJCAI.

[132]  R. Lathe Phd by thesis , 1988, Nature.

[133]  Tom Elliott Fawcett Feature discovery for problem solving systems , 1993 .

[134]  Aravaipa Canyon Basin,et al.  Volume 3 , 2012, Journal of Diabetes Investigation.

[135]  北野 宏明,et al.  RoboCup-97 : robot soccer World Cup I , 1998 .

[136]  Hiroaki Kitano,et al.  RoboCup-98: Robot Soccer World Cup II , 2001, Lecture Notes in Computer Science.

[137]  Robert Givan,et al.  Model Minimization in Markov Decision Processes , 1997, AAAI/IAAI.

[138]  Minoru Asada,et al.  Vision-Based Behavior Acquisition For A Shooting Robot By Using A Reinforcement Learning , 1994 .

[139]  Rajarshi Das,et al.  Utility functions in autonomic systems , 2004 .

[140]  T. Michael Knasel,et al.  Robotics and autonomous systems , 1988, Robotics Auton. Syst..

[141]  Peter Stone,et al.  The Chin Pinch: A Case Study in Skill Learning on a Legged Robot , 2006, RoboCup.

[142]  Peter Stone,et al.  A multi-robot system for continuous area sweeping tasks , 2006, Proceedings 2006 IEEE International Conference on Robotics and Automation, 2006. ICRA 2006..

[143]  Kerstin Dautenhahn,et al.  Getting to know each other - Artificial social intelligence for autonomous robots , 1995, Robotics Auton. Syst..

[144]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[145]  Gerhard Weiss,et al.  Multiagent Systems , 1999 .

[146]  Prahlad Vadakkepat,et al.  An Evolutionary Algorithm for Trajectory Based Gait Generation of Biped Robot , 2003 .

[147]  Shimon Whiteson,et al.  Evolutionary Function Approximation for Reinforcement Learning , 2006, J. Mach. Learn. Res..

[148]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[149]  Shou-De Lin,et al.  Designing the Market Game for a Trading Agent Competition , 2001, IEEE Internet Comput..

[150]  Ian Frank,et al.  Soccer Server: A Tool for Research on Multiagent Systems , 1998, Appl. Artif. Intell..

[151]  Dieter Fox,et al.  Adaptive real-time particle filters for robot localization , 2003, 2003 IEEE International Conference on Robotics and Automation (Cat. No.03CH37422).

[152]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.

[153]  Stuart J. Russell Rationality and Intelligence , 1995, IJCAI.

[154]  Manuela M. Veloso,et al.  Layered Learning , 2000, ECML.

[155]  Enric Celaya,et al.  Efficient gait generation using reinforcement learning , 2001 .

[156]  Sebastian Thrun,et al.  Winning the DARPA Grand Challenge with an AI Robot , 2006, AAAI.

[157]  Shimon Whiteson,et al.  Concurrent layered learning , 2003, AAMAS '03.

[158]  Michael P. Wellman,et al.  A Parametrization of the Auction Design Space , 2001, Games Econ. Behav..

[159]  Peter Stone,et al.  Value Functions for RL-Based Behavior Transfer: A Comparative Study , 2005, AAAI.