Adaptive modelling and planning for learning intelligent behaviour

An intelligent agent must be capable of using its past experience to develop an understanding of how its actions affect the world in which it is situated. Given some objective, the agent must be able to effectively use its understanding of the world to produce a plan that is robust to the uncertainty present in the world. This thesis presents a novel computational framework called the Adaptive Modelling and Planning System (AMPS) that aims to meet these requirements for intelligence. The challenge of the agent is to use its experience in the world to generate a model. In problems with large state and action spaces, the agent can generalise from limited experience by grouping together similar states and actions, effectively partitioning the state and action spaces into finite sets of regions. This process is called abstraction. Several different abstraction approaches have been proposed in the literature, but the existing algorithms have many limitations. They generally only increase resolution, require a large amount of data before changing the abstraction, do not generalise over actions, and are computationally expensive. AMPS aims to solve these problems using a new kind of approach. AMPS splits and merges existing regions in its abstraction according to a set of heuristics. The system introduces splits using a mechanism related to supervised learning and is defined in a general way, allowing AMPS to leverage a wide variety of representations. The system merges existing regions when an analysis of the current plan indicates that doing so could be useful. Because several different regions may require revision at any given time, AMPS prioritises revision to best utilise whatever computational resources are available. Changes in the abstraction lead to changes in the model, requiring changes to the plan. AMPS prioritises the planning process, and when the agent has time, it replans in high-priority regions. This thesis demonstrates the flexibility and strength of this approach in learning intelligent behaviour from limited experience.

[1]  Ralf Herbrich,et al.  Learning Kernel Classifiers: Theory and Algorithms , 2001 .

[2]  Stephen P. Brooks,et al.  Markov Decision Processes. , 1995 .

[3]  Sean R Eddy,et al.  What is dynamic programming? , 2004, Nature Biotechnology.

[4]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[5]  Leslie Pack Kaelbling,et al.  Input Generalization in Delayed Reinforcement Learning: An Algorithm and Performance Comparisons , 1991, IJCAI.

[6]  G. Rota The Number of Partitions of a Set , 1964 .

[7]  Andrew Y. Ng,et al.  Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.

[8]  Richard S. Sutton,et al.  Temporal credit assignment in reinforcement learning , 1984 .

[9]  Ivan Bratko,et al.  Transfer of control skill by machine learning , 1997 .

[10]  Vivek S. Borkar,et al.  Optimal Control of Diffusion Processes , 1989 .

[11]  Paul A. Crook Learning in a state of confusion : employing active perception and reinforcement learning in partially observable worlds , 2007 .

[12]  J. Simpson,et al.  The Oxford English Dictionary , 1884 .

[13]  Shai Ben-David,et al.  On the difficulty of approximately maximizing agreements , 2000, J. Comput. Syst. Sci..

[14]  Andrew W. Moore,et al.  The parti-game algorithm for variable resolution reinforcement learning in multidimensional state-spaces , 2004, Machine Learning.

[15]  Joanne H. Walker,et al.  Evolving Controllers for Real Robots: A Survey of the Literature , 2003, Adapt. Behav..

[16]  M. Perman,et al.  Semi-Markov models with an application to power-plant reliability analysis , 1997 .

[17]  S. Hawking The Universe in a Nutshell , 2006 .

[18]  Rémi Munos Apprentissage par renforcement, étude du cas continu , 1997 .

[19]  Bart Selman,et al.  Planning as Satisfiability , 1992, ECAI.

[20]  Kevin D. Seppi,et al.  P3VI: a partitioned, prioritized, parallel value iterator , 2004, ICML.

[21]  Christopher G. Atkeson,et al.  A comparison of direct and model-based reinforcement learning , 1997, Proceedings of International Conference on Robotics and Automation.

[22]  Saul B. Gelfand,et al.  Classification trees with neural network feature extraction , 1992, IEEE Trans. Neural Networks.

[23]  Mykel J. Kochenderfer,et al.  Adaptive Partitioning of State Spaces using Decision Graphs for Real-Time Modeling and Planning , 2005 .

[24]  R. K. Shyamasundar,et al.  Introduction to algorithms , 1996 .

[25]  David G. Stork,et al.  Pattern Classification , 1973 .

[26]  Jessica K. Hodgins,et al.  Animating Athletic Motion Planning By Example , 2000, Graphics Interface.

[27]  Simon Kasif,et al.  Induction of Oblique Decision Trees , 1993, IJCAI.

[28]  W. M. Bolstad Introduction to Bayesian Statistics , 2004 .

[29]  Maria Fox,et al.  PDDL2.1: An Extension to PDDL for Expressing Temporal Planning Domains , 2003, J. Artif. Intell. Res..

[30]  Randal E. Bryant,et al.  Graph-Based Algorithms for Boolean Function Manipulation , 1986, IEEE Transactions on Computers.

[31]  Jing Peng,et al.  Incremental multi-step Q-learning , 1994, Machine Learning.

[32]  E. Lehmann Testing Statistical Hypotheses , 1960 .

[33]  Richard E. Neapolitan,et al.  Learning Bayesian networks , 2007, KDD '07.

[34]  D. Moore Simplicial Mesh Generation with Applications , 1992 .

[35]  R. Rivest Learning Decision Lists , 1987, Machine Learning.

[36]  William E. Hart,et al.  Recent Advances in Memetic Algorithms , 2008 .

[37]  David S. Watkins,et al.  Fundamentals of matrix computations , 1991 .

[38]  Peter Dayan,et al.  Technical Note: Q-Learning , 2004, Machine Learning.

[39]  Robert E. Tarjan,et al.  Relaxed heaps: an alternative to Fibonacci heaps with applications to parallel computation , 1988, CACM.

[40]  Mahesan Niranjan,et al.  On-line Q-learning using connectionist systems , 1994 .

[41]  Yoram Singer,et al.  Efficient Bayesian Parameter Estimation in Large Discrete Domains , 1998, NIPS.

[42]  Keiji Kanazawa,et al.  A model for reasoning about persistence and causation , 1989 .

[43]  Stephan Pareigis,et al.  Multi-Grid Methods for Reinforcement Learning in Controlled Diffusion Processes , 1996, NIPS.

[44]  Anne Condon,et al.  On the undecidability of probabilistic planning and related stochastic optimization problems , 2003, Artif. Intell..

[45]  Beate Bollig,et al.  Improving the Variable Ordering of OBDDs Is NP-Complete , 1996, IEEE Trans. Computers.

[46]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[47]  Edward J. Sondik,et al.  The Optimal Control of Partially Observable Markov Processes over the Infinite Horizon: Discounted Costs , 1978, Oper. Res..

[48]  Shobha Venkataraman,et al.  Efficient Solution Algorithms for Factored MDPs , 2003, J. Artif. Intell. Res..

[49]  L. A. Goodman On the Exact Variance of Products , 1960 .

[50]  Manuela M. Veloso,et al.  TTree: Tree-Based State Generalization with Temporally Abstract Actions , 2002, SARA.

[51]  Nils J. Nilsson,et al.  Correction to "A Formal Basis for the Heuristic Determination of Minimum Cost Paths" , 1972, SGAR.

[52]  Mykel J. Kochenderfer Adaptive Modeling and Planning for Reactive Agents , 2005, AAAI.

[53]  Paul E. Utgoff,et al.  On integrating apprentice learning and reinforcement learning , 1996 .

[54]  Paolo Traverso,et al.  Automated planning - theory and practice , 2004 .

[55]  David Hinkley,et al.  Bootstrap Methods: Another Look at the Jackknife , 2008 .

[56]  Michael L. Littman,et al.  Memoryless policies: theoretical limitations and practical results , 1994 .

[57]  Leslie Pack Kaelbling,et al.  Learning in embedded systems , 1993 .

[58]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[59]  Patchigolla Kiran Kumar,et al.  A Survey of Some Results in Stochastic Adaptive Control , 1985 .

[60]  Bernd Fritzke,et al.  A Growing Neural Gas Network Learns Topologies , 1994, NIPS.

[61]  Luc De Raedt,et al.  Inductive Logic Programming: Theory and Methods , 1994, J. Log. Program..

[62]  Michael L. Littman,et al.  Algorithms for Sequential Decision Making , 1996 .

[63]  Sven Koenig,et al.  Speeding up the Parti-Game Algorithm , 2002, NIPS.

[64]  Peter Stone,et al.  State Abstraction Discovery from Irrelevant State Variables , 2005, IJCAI.

[65]  V. Borkar Controlled diffusion processes , 2005, math/0511077.

[66]  Alain Bensoussan,et al.  Stochastic Production Planning with Production Constraints , 1980 .

[67]  Ralf Korn,et al.  A Stochastic Control Approach to Portfolio Problems with Stochastic Interest Rates , 2001, SIAM J. Control. Optim..

[68]  Carl E. Rasmussen,et al.  Gaussian Processes in Reinforcement Learning , 2003, NIPS.

[69]  Ethem Alpaydin,et al.  Omnivariate decision trees , 2001, IEEE Trans. Neural Networks.

[70]  Sholom M. Weiss,et al.  Small Sample Error Rate Estimation for k-NN Classifiers , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[71]  Guy L. Steele,et al.  The Java Language Specification , 1996 .

[72]  Michael Kearns,et al.  Finite-Sample Convergence Rates for Q-Learning and Indirect Algorithms , 1998, NIPS.

[73]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[74]  Jon Louis Bentley,et al.  Multidimensional binary search trees used for associative searching , 1975, CACM.

[75]  Steven D. Whitehead,et al.  A Complexity Analysis of Cooperative Mechanisms in Reinforcement Learning , 1991, AAAI.

[76]  B. Efron Estimating the Error Rate of a Prediction Rule: Improvement on Cross-Validation , 1983 .

[77]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[78]  Teuvo Kohonen,et al.  Self-organized formation of topologically correct feature maps , 2004, Biological Cybernetics.

[79]  Ricardo A. Baeza-Yates,et al.  Searching in metric spaces , 2001, CSUR.

[80]  Sridhar Mahadevan,et al.  Recent Advances in Hierarchical Reinforcement Learning , 2003, Discret. Event Dyn. Syst..

[81]  Rui Camacho,et al.  Building symbolic representations of intuitive real-time skills from performance data , 1994, Machine Intelligence 13.

[82]  James L. Carroll,et al.  Memory-guided exploration in reinforcement learning , 2001, IJCNN'01. International Joint Conference on Neural Networks. Proceedings (Cat. No.01CH37222).

[83]  Hans Ulrich Simon,et al.  Robust Trainability of Single Neurons , 1995, J. Comput. Syst. Sci..

[84]  Fausto Giunchiglia,et al.  A Theory of Abstraction , 1992, Artif. Intell..

[85]  P. Sopp Cluster analysis. , 1996, Veterinary immunology and immunopathology.

[86]  L. Baird Reinforcement Learning Through Gradient Descent , 1999 .

[87]  Hanan Samet,et al.  Index-driven similarity search in metric spaces (Survey Article) , 2003, TODS.

[88]  Jürgen Schmidhuber,et al.  HQ-Learning , 1997, Adapt. Behav..

[89]  Jan Ramon Thesis: clustering and instance based learning in first order logic , 2002 .

[90]  Nils J. Nilsson,et al.  Teleo-Reactive Programs for Agent Control , 1993, J. Artif. Intell. Res..

[91]  Maja J. Mataric,et al.  Reward Functions for Accelerated Learning , 1994, ICML.

[92]  J. Bert Keats,et al.  Statistical Methods for Reliability Data , 1999 .

[93]  R. Cook Influential Observations in Linear Regression , 1979 .

[94]  Gavin Adrian Rummery Problem solving with reinforcement learning , 1995 .

[95]  David G. Stork,et al.  Pattern classification, 2nd Edition , 2000 .

[96]  Craig Boutilier,et al.  Abstraction and Approximate Decision-Theoretic Planning , 1997, Artif. Intell..

[97]  P. Kumar,et al.  A new family of optimal adaptive controllers for Markov chains , 1982 .

[98]  Stuart I. Reynolds,et al.  Adaptive Resolution Model-Free Reinforcement Learning: Decision Boundary Partitioning , 2000, International Conference on Machine Learning.

[99]  D. R. Fulkerson,et al.  Flows in Networks. , 1964 .

[100]  David W. Aha,et al.  Instance-Based Learning Algorithms , 1991, Machine Learning.

[101]  Kenji Doya,et al.  Reinforcement Learning in Continuous Time and Space , 2000, Neural Computation.

[102]  A. L. Samuel,et al.  Some studies in machine learning using the game of checkers. II: recent progress , 1967 .

[103]  Paul E. Utgoff,et al.  Decision Tree Induction Based on Efficient Tree Restructuring , 1997, Machine Learning.

[104]  Saso Dzeroski,et al.  Integrating Guidance into Relational Reinforcement Learning , 2004, Machine Learning.

[105]  Satoru Kawai,et al.  An Algorithm for Drawing General Undirected Graphs , 1989, Inf. Process. Lett..

[106]  Sridhar Mahadevan,et al.  Enhancing Transfer in Reinforcement Learning by Building Stochastic Models of Robot Actions , 1992, ML.

[107]  Tony R. Martinez,et al.  Improved Heterogeneous Distance Functions , 1996, J. Artif. Intell. Res..

[108]  Simon Haykin,et al.  Neural Networks: A Comprehensive Foundation , 1998 .

[109]  Vijaykumar Gullapalli,et al.  A stochastic reinforcement learning algorithm for learning real-valued functions , 1990, Neural Networks.

[110]  Teuvo Kohonen,et al.  Self-Organizing Maps , 2010 .

[111]  David Andre,et al.  State abstraction for programmable reinforcement learning agents , 2002, AAAI/IAAI.

[112]  A. Gosavi,et al.  A reinforcement learning approach to a single leg airline revenue management problem with multiple fare classes and overbooking , 2002 .

[113]  Richard S. Sutton,et al.  Predictive Representations of State , 2001, NIPS.

[114]  G. Gates The Reduced Nearest Neighbor Rule , 1998 .

[115]  Marco Patella,et al.  Searching in metric spaces with user-defined and approximate distances , 2002, TODS.

[116]  James F. Allen Towards a General Theory of Action and Time , 1984, Artif. Intell..

[117]  Nils J. Nilsson,et al.  Artificial Intelligence: A New Synthesis , 1997 .

[118]  Nils J. Nilsson,et al.  Toward agent programs with circuit semantics , 1992 .

[119]  Andrew James Smith,et al.  Applications of the self-organising map to reinforcement learning , 2002, Neural Networks.

[120]  Fred W. Glover,et al.  Tabu Search , 1997, Handbook of Heuristics.

[121]  Ronald A. Howard,et al.  Dynamic Probabilistic Systems , 1971 .

[122]  Rémi Munos,et al.  A Study of Reinforcement Learning in the Continuous Case by the Means of Viscosity Solutions , 2000, Machine Learning.

[123]  H. Kushner Numerical Methods for Stochastic Control Problems in Continuous Time , 2000 .

[124]  Sridhar Mahadevan,et al.  Continuous-Time Hierarchical Reinforcement Learning , 2001, ICML.

[125]  M. R. Mickey,et al.  Estimation of Error Rates in Discriminant Analysis , 1968 .

[126]  Nils J. Nilsson,et al.  Teleo-Reactive Programs and the Triple-Tower Architecture , 2001, Electron. Trans. Artif. Intell..

[127]  A. Atiya,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2005, IEEE Transactions on Neural Networks.

[128]  Sebastian Thrun,et al.  Issues in Using Function Approximation for Reinforcement Learning , 1999 .

[129]  R. Fikes,et al.  JTP : A System Architecture and Component Library for Hybrid Reasoning , 2003 .

[130]  Peter N. Yianilos,et al.  Data structures and algorithms for nearest neighbor search in general metric spaces , 1993, SODA '93.

[131]  Andrew W. Moore,et al.  Locally Weighted Learning , 1997, Artificial Intelligence Review.

[132]  Ronald E. Parr,et al.  Hierarchical control and learning for markov decision processes , 1998 .

[133]  Subbarao Kambhampati,et al.  Planning as constraint satisfaction: Solving the planning graph by compiling it into CSP , 2001, Artif. Intell..

[134]  Garrison W. Cottrell,et al.  Principled Methods for Advising Reinforcement Learning Agents , 2003, ICML.

[135]  Scott Sherwood Benson,et al.  Learning action models for reactive autonomous agents , 1996 .

[136]  Paul E. Utgoff,et al.  Perceptron Trees : A Case Study in ybrid Concept epresentations , 1999 .

[137]  Chris Watkins,et al.  Learning from delayed rewards , 1989 .

[138]  Satinder Singh Baveja,et al.  Using predictions for planning and modeling in stochastic environments , 2005 .

[139]  James E. Gentle,et al.  Elements of computational statistics , 2002 .

[140]  Richard S. Sutton,et al.  Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.

[141]  Drew McDermott,et al.  Modeling a Dynamic and Uncertain World I: Symbolic and Probabilistic Reasoning About Change , 1994, Artif. Intell..

[142]  Jan Ramon,et al.  Clustering and instance based learning in first order logic , 2002, AI Communications.

[143]  Michael I. Jordan,et al.  PEGASUS: A policy search method for large MDPs and POMDPs , 2000, UAI.

[144]  Minoru Asada,et al.  Purposive Behavior Acquisition for a Real Robot by Vision-Based Reinforcement Learning , 2005, Machine Learning.

[145]  John N. Tsitsiklis,et al.  Asynchronous stochastic approximation and Q-learning , 1994, Mach. Learn..

[146]  Long Lin,et al.  Memory Approaches to Reinforcement Learning in Non-Markovian Domains , 1992 .

[147]  Avrim Blum,et al.  Fast Planning Through Planning Graph Analysis , 1995, IJCAI.

[148]  R. Andrew McCallum,et al.  Hidden state and reinforcement learning with instance-based state identification , 1996, IEEE Trans. Syst. Man Cybern. Part B.

[149]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[150]  D. Bertsekas,et al.  Adaptive aggregation methods for infinite horizon dynamic programming , 1989 .

[151]  Andrew G. Barto,et al.  Convergence of Indirect Adaptive Asynchronous Value Iteration Algorithms , 1993, NIPS.

[152]  Robert Tibshirani,et al.  An Introduction to the Bootstrap , 1994 .

[153]  Stephen R. Marsland,et al.  A self-organising network that grows when required , 2002, Neural Networks.

[154]  Andrew W. Moore,et al.  Prioritized Sweeping: Reinforcement Learning with Less Data and Less Time , 1993, Machine Learning.

[155]  Andrew Tridgell,et al.  Learning to Play Chess Using Temporal Differences , 2000, Machine Learning.

[156]  Jon Louis Bentley,et al.  An Algorithm for Finding Best Matches in Logarithmic Expected Time , 1977, TOMS.

[157]  Iraj Kalantari,et al.  A Data Structure and an Algorithm for the Nearest Point Problem , 1983, IEEE Transactions on Software Engineering.

[158]  Enrico Macii,et al.  Algebric Decision Diagrams and Their Applications , 1997, ICCAD '93.

[159]  A. Einstein On the Method of Theoretical Physics , 1934, Philosophy of Science.

[160]  José del R. Millán,et al.  Continuous-Action Q-Learning , 2002, Machine Learning.

[161]  Luis H. R. Alvarez,et al.  Stochastic Forest Stand Value and Optimal Timber Harvesting , 2003, SIAM J. Control. Optim..

[162]  John R. Koza,et al.  Genetic programming - on the programming of computers by means of natural selection , 1993, Complex adaptive systems.

[163]  Manuela Veloso,et al.  Tree based hierarchical reinforcement learning , 2002 .

[164]  Rui Xu,et al.  Survey of clustering algorithms , 2005, IEEE Transactions on Neural Networks.

[165]  Malcolm J. A. Strens,et al.  A Bayesian Framework for Reinforcement Learning , 2000, ICML.

[166]  Sebastian Thrun,et al.  The role of exploration in learning control , 1992 .

[167]  Håkan L. S. Younes,et al.  The First Probabilistic Track of the International Planning Competition , 2005, J. Artif. Intell. Res..

[168]  Eric Wiewiora,et al.  Potential-Based Shaping and Q-Value Initialization are Equivalent , 2003, J. Artif. Intell. Res..

[169]  E. Kandel The Molecular Biology of Memory Storage: A Dialogue Between Genes and Synapses , 2001, Science.

[170]  Mary Jo Nye,et al.  The modern physical and mathematical sciences , 2003 .

[171]  Long Ji Lin,et al.  Programming Robots Using Reinforcement Learning and Teaching , 1991, AAAI.

[172]  Ralph Johnson,et al.  design patterns elements of reusable object oriented software , 2019 .

[173]  Andrew G. Barto,et al.  Learning to Act Using Real-Time Dynamic Programming , 1995, Artif. Intell..

[174]  Abhijit Gosavi,et al.  Self-Improving Factory Simulation using Continuous-time Average-Reward Reinforcement Learning , 2007 .

[175]  Nello Cristianini,et al.  Enlarging the Margins in Perceptron Decision Trees , 2000, Machine Learning.

[176]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[177]  Yaakov Engel,et al.  Algorithms and representations for reinforcement learning (עם תקציר בעברית, תכן ושער נוסף: אלגוריתמים וייצוגים ללמידה מחיזוקים.; אלגוריתמים וייצוגים ללמידה מחיזוקים.) , 2005 .

[178]  Dimitri P. Bertsekas,et al.  Stochastic optimal control : the discrete time case , 2007 .

[179]  Andrew G. Barto,et al.  Optimal learning: computational procedures for bayes-adaptive markov decision processes , 2002 .

[180]  John Loch,et al.  Using Eligibility Traces to Find the Best Memoryless Policy in Partially Observable Markov Decision Processes , 1998, ICML.

[181]  Shin Ishii,et al.  A model-based reinforcement learning: a computational model and an fMRI study , 2003, ESANN.

[182]  R. Bellman,et al.  Dynamic Programming and Markov Processes , 1960 .

[183]  R. Baierlein Probability Theory: The Logic of Science , 2004 .

[184]  De,et al.  Relational Reinforcement Learning , 2001, Encyclopedia of Machine Learning and Data Mining.

[185]  Robert Zubek,et al.  Applying Inexpensive AI Techniques to Computer Games , 2002, IEEE Intell. Syst..

[186]  W. Fleming,et al.  Optimal Control for Partially Observed Diffusions , 1982 .

[187]  Andrew McCallum,et al.  Reinforcement learning with selective perception and hidden state , 1996 .

[188]  Sebastian Thrun,et al.  Monte Carlo POMDPs , 1999, NIPS.

[189]  Blai Bonet,et al.  Planning as heuristic search , 2001, Artif. Intell..

[190]  Wynn C. Stirling,et al.  Satisficing Revisited , 2000, Minds and Machines.

[191]  G. W. Milligan,et al.  An examination of procedures for determining the number of clusters in a data set , 1985 .

[192]  Nils J. Nilsson,et al.  Learning Strategies for Mid-Level Robot Control: Some Preliminary Considerations and Experiments , 2000 .

[193]  Jürgen Schmidhuber,et al.  Efficient model-based exploration , 1998 .

[194]  G. Tesauro Practical Issues in Temporal Difference Learning , 1992 .

[195]  Andrew W. Moore,et al.  Efficient memory-based learning for robot control , 1990 .

[196]  David George Heath,et al.  A geometric framework for machine learning , 1993 .

[197]  M. R. Genesereth,et al.  Knowledge Interchange Format Version 3.0 Reference Manual , 1992, LICS 1992.

[198]  T. Tjelta,et al.  Semi-Markov multistate modeling of the land mobile propagation channel for geostationary satellites , 2002 .

[199]  Thomas G. Dietterich State Abstraction in MAXQ Hierarchical Reinforcement Learning , 1999, NIPS.

[200]  Nils J. Nilsson,et al.  Shakey the Robot , 1984 .

[201]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[202]  Milos Hauskrecht,et al.  Value-Function Approximations for Partially Observable Markov Decision Processes , 2000, J. Artif. Intell. Res..

[203]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[204]  J. Peng,et al.  Efficient Learning and Planning Within the Dyna Framework , 1993, IEEE International Conference on Neural Networks.

[205]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2004 .

[206]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[207]  Ashwin Ram,et al.  Experiments with Reinforcement Learning in Problems with Continuous State and Action Spaces , 1997, Adapt. Behav..

[208]  Leslie Pack Kaelbling,et al.  Learning Planning Rules in Noisy Stochastic Worlds , 2005, AAAI.

[209]  Andrew W. Moore,et al.  Generalization in Reinforcement Learning: Safely Approximating the Value Function , 1994, NIPS.

[210]  Apostolos N. Papadopoulos,et al.  Nearest Neighbor Search:: A Database Perspective , 2004 .

[211]  Nils J. Nilsson,et al.  A Formal Basis for the Heuristic Determination of Minimum Cost Paths , 1968, IEEE Trans. Syst. Sci. Cybern..

[212]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[213]  Chi-Ren Shyu,et al.  EBS k-d Tree: An Entropy Balanced Statistical k-d Tree for Image Databases with Ground-Truth Labels , 2003, CIVR.

[214]  David Andre,et al.  Model based Bayesian Exploration , 1999, UAI.

[215]  Liming Xiang,et al.  Kernel-Based Reinforcement Learning , 2006, ICIC.

[216]  Frederic Maire,et al.  Automatic State Construction using Decision Tree for Reinforcement Learning Agents , 2004 .

[217]  Kee-Eung Kim,et al.  Solving Factored MDPs with Large Action Space Using Algebraic Decision Diagrams , 2002, PRICAI.

[218]  Gerald DeJong,et al.  The Influence of Reward on the Speed of Reinforcement Learning: An Analysis of Shaping , 2003, ICML.

[219]  Gerald Tesauro,et al.  Programming backgammon using self-teaching neural nets , 2002, Artif. Intell..

[220]  Ramón López de Mántaras,et al.  A distance-based attribute selection measure for decision tree induction , 1991, Machine Learning.

[221]  Mykel J. Kochenderfer,et al.  Modeling and Planning in Large State and Action Spaces , 2005 .

[222]  Max Christopher Goebel An Empirical Investigation into Function Approximation with Reinforcement Learning , 2005 .

[223]  Ronald L. Rivest,et al.  Constructing Optimal Binary Decision Trees is NP-Complete , 1976, Inf. Process. Lett..

[224]  R. J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[225]  Emden R. Gansner,et al.  An open graph visualization system and its applications to software engineering , 2000 .

[226]  B. Everitt,et al.  Independent Cellular Processes for Hippocampal Memory Consolidation and Reconsolidation , 2004, Science.

[227]  Maja J. Matarić,et al.  Learning to Use Selective Attention and Short-Term Memory in Sequential Tasks , 1996 .

[228]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[229]  Ethem Alpaydın,et al.  Combined 5 x 2 cv F Test for Comparing Supervised Classification Learning Algorithms , 1999, Neural Comput..

[230]  Marco Wiering,et al.  Explorations in efficient reinforcement learning , 1999 .

[231]  Donald Michie,et al.  Man-Machine Co-operation on a Learning Task , 1969 .

[232]  Martijn van Otterlo,et al.  A survey of reinforcement learning in relational domains , 2005 .

[233]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[234]  Nils J. Nilsson,et al.  Reacting, Planning, and Learning in an Autonomous Agent , 1996, Machine Intelligence 14.

[235]  Ronald J. Williams,et al.  Tight Performance Bounds on Greedy Policies Based on Imperfect Value Functions , 1993 .

[236]  W. Nelson Statistical Methods for Reliability Data , 1998 .

[237]  Hendrik Van Brussel,et al.  A self-learning automaton with variable resolution for high precision assembly by industrial robots , 1982 .

[238]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[239]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[240]  Longxin Lin Self-Improving Reactive Agents Based on Reinforcement Learning, Planning and Teaching , 2004, Machine Learning.

[241]  Craig Boutilier,et al.  Optimal and Approximate Stochastic Planning using Decision Diagrams , 2000 .

[242]  Michael Kearns,et al.  Near-Optimal Reinforcement Learning in Polynomial Time , 2002, Machine Learning.

[243]  Edsger W. Dijkstra,et al.  A note on two problems in connexion with graphs , 1959, Numerische Mathematik.

[244]  Hendrik Blockeel,et al.  Top-Down Induction of First Order Logical Decision Trees , 1998, AI Commun..

[245]  S. Gregoir,et al.  Stochastic Limit Theory : An Introduction for Econometricians , 1996 .

[246]  Craig Boutilier,et al.  Stochastic dynamic programming with factored representations , 2000, Artif. Intell..

[247]  Marek Kretowski,et al.  An Evolutionary Algorithm for Oblique Decision Tree Induction , 2004, ICAISC.

[248]  Emden R. Gansner,et al.  A Technique for Drawing Directed Graphs , 1993, IEEE Trans. Software Eng..

[249]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[250]  Bernard Widrow,et al.  Adaptive switching circuits , 1988 .

[251]  Geoffrey J. Gordon,et al.  Generalizing Dijkstra's Algorithm and Gaussian Elimination for Solving MDPs , 2005 .

[252]  Carla E. Brodley,et al.  Multivariate decision trees , 2004, Machine Learning.

[253]  Manuela M. Veloso,et al.  Tree Based Discretization for Continuous State Space Reinforcement Learning , 1998, AAAI/IAAI.

[254]  H. Anton Elementary Linear Algebra , 1970 .

[255]  Andrew James Smith,et al.  Dynamic generalisation of continuous action spaces in reinforcement learning : a neurally inspired approach , 2002 .

[256]  H. Simon,et al.  Rational choice and the structure of the environment. , 1956, Psychological review.

[257]  Dean A. Pomerleau,et al.  Neural Network Perception for Mobile Robot Guidance , 1993 .

[258]  Dorothea Heiss-Czedik,et al.  An Introduction to Genetic Algorithms. , 1997, Artificial Life.

[259]  Long Ji Lin,et al.  Reinforcement Learning of Non-Markov Decision Processes , 1995, Artif. Intell..

[260]  D. Luenberger Optimization by Vector Space Methods , 1968 .

[261]  Peter Eades,et al.  A Heuristic for Graph Drawing , 1984 .

[262]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[263]  Ronen I. Brafman,et al.  R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..

[264]  Richard Fikes,et al.  STRIPS: A New Approach to the Application of Theorem Proving to Problem Solving , 1971, IJCAI.

[265]  Robert Givan,et al.  Equivalence notions and model minimization in Markov decision processes , 2003, Artif. Intell..

[266]  Joseph Weizenbaum,et al.  and Machine , 1977 .

[267]  Michael L. Littman,et al.  Probabilistic Propositional Planning: Representations and Complexity , 1997, AAAI/IAAI.

[268]  Kevin D. Seppi,et al.  Prioritization Methods for Accelerating MDP Solvers , 2005, J. Mach. Learn. Res..

[269]  Roni Khardon,et al.  Learning Action Strategies for Planning Domains , 1999, Artif. Intell..

[270]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[271]  Ronald J. Williams,et al.  Robust, Efficient, Globally-Optimized Reinforcement Learning with the Parti-Game Algorithm , 1998, NIPS.

[272]  Guy L. Steele,et al.  Java(TM) Language Specification, The (3rd Edition) (Java (Addison-Wesley)) , 2005 .

[273]  Hans Vollbrecht,et al.  Hierarchical reinforcement learning in continuous state spaces , 2003 .

[274]  David L. Dowe,et al.  MML Inference of Oblique Decision Trees , 2004, Australian Conference on Artificial Intelligence.

[275]  John N. Tsitsiklis,et al.  Parallel and distributed computation , 1989 .

[276]  Simon Kasif,et al.  A System for Induction of Oblique Decision Trees , 1994, J. Artif. Intell. Res..

[277]  Rémi Munos,et al.  Reinforcement Learning for Continuous Stochastic Control Problems , 1997, NIPS.

[278]  Dong Xu,et al.  ProteinDBS: a real-time retrieval system for protein structure comparison , 2004, Nucleic Acids Res..

[279]  Preben Alstrøm,et al.  Learning to Drive a Bicycle Using Reinforcement Learning and Shaping , 1998, ICML.

[280]  SRIDHAR MAHADEVAN,et al.  Average Reward Reinforcement Learning: Foundations, Algorithms, and Empirical Results , 2005, Machine Learning.

[281]  Leemon C Baird,et al.  Reinforcement Learning With High-Dimensional, Continuous Actions , 1993 .

[282]  F. Wilcoxon Individual Comparisons by Ranking Methods , 1945 .

[283]  Gene H. Golub,et al.  Matrix computations , 1983 .

[284]  Craig Boutilier,et al.  Decision-Theoretic Planning: Structural Assumptions and Computational Leverage , 1999, J. Artif. Intell. Res..

[285]  Nicholas Kushmerick,et al.  An Algorithm for Probabilistic Planning , 1995, Artif. Intell..

[286]  P. Kloeden,et al.  Numerical Solution of Stochastic Differential Equations , 1992 .

[287]  Drew McDermott,et al.  The 1998 AI Planning Systems Competition , 2000, AI Mag..

[288]  Xi-Ren Cao,et al.  A basic formula for online policy gradient algorithms , 2005, IEEE Transactions on Automatic Control.

[289]  Bruno Sericola,et al.  Performability Analysis Using Semi-Markov Reard Processes , 1990, IEEE Trans. Computers.

[290]  David L. Neuhoff,et al.  Quantization , 2022, IEEE Trans. Inf. Theory.

[291]  Robert E. Tarjan,et al.  Fibonacci heaps and their uses in improved network optimization algorithms , 1984, JACM.

[292]  Andrew W. Moore,et al.  Variable Resolution Discretization in Optimal Control , 2002, Machine Learning.

[293]  Leslie Pack Kaelbling,et al.  Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[294]  Ken Arnold,et al.  The Java Programming Language , 1996 .

[295]  장훈,et al.  [서평]「Computer Organization and Design, The Hardware/Software Interface」 , 1997 .

[296]  R. Fisher,et al.  On the Mathematical Foundations of Theoretical Statistics , 1922 .

[297]  Dimitri Bertsekas,et al.  Distributed dynamic programming , 1981, 1981 20th IEEE Conference on Decision and Control including the Symposium on Adaptive Processes.

[298]  Michael O. Duff,et al.  Reinforcement Learning Methods for Continuous-Time Markov Decision Problems , 1994, NIPS.

[299]  Anil K. Jain,et al.  Bootstrap Techniques for Error Estimation , 1987, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[300]  Arthur L. Samuel,et al.  Some Studies in Machine Learning Using the Game of Checkers , 1967, IBM J. Res. Dev..

[301]  Kunle Olukotun,et al.  The Future of Microprocessors , 2005, ACM Queue.

[302]  C. G. Hilborn,et al.  The Condensed Nearest Neighbor Rule , 1967 .

[303]  Mykel J. Kochenderfer Evolving Hierarchical and Recursive Teleo-reactive Programs through Genetic Programming , 2003, EuroGP.

[304]  Leslie Pack Kaelbling,et al.  Making Reinforcement Learning Work on Real Robots , 2002 .

[305]  Richard S. Sutton,et al.  Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[306]  Peter L. Bartlett,et al.  Infinite-Horizon Policy-Gradient Estimation , 2001, J. Artif. Intell. Res..

[307]  Paul J. Schweitzer,et al.  Iterative Aggregation-Disaggregation Procedures for Discounted Semi-Markov Reward Processes , 1985, Oper. Res..

[308]  Leslie Pack Kaelbling,et al.  Practical Reinforcement Learning in Continuous Spaces , 2000, ICML.

[309]  Richard Bellman,et al.  ON A ROUTING PROBLEM , 1958 .

[310]  Paul Bourgine,et al.  Exploration of Multi-State Environments: Local Measures and Back-Propagation of Uncertainty , 1999, Machine Learning.

[311]  I. Newton Philosophiæ naturalis principia mathematica , 1973 .