iCORE Research Grant Proposal Reinforcement Learning and Artificial Intelligence

We propose to create a new laboratory at the University of Alberta dedicated to research in reinforcement learning (RL) as an approach to artificial intelligence. RL is a body of theory and techniques for learning an optimal control policy in sequential decision-making situations. It applies to any task that involves taking a sequence of actions (e.g., flying a helicopter, playing backgammon, elevator scheduling, resource-constrained scheduling) where the effects of one action influences the utility of subsequent actions. RL methods are generating increasing attention in engineering, psychology, and neuroscience because they can be applied as part of a system’s normal operation, without requiring special supervision or training information. It has already been applied to all of the tasks mentioned above, as well as in robotics, process control, communications networks, and finance.

[1]  Arthur L. Samuel,et al.  Some Studies in Machine Learning Using the Game of Checkers , 1967, IBM J. Res. Dev..

[2]  L. Baum,et al.  A Maximization Technique Occurring in the Statistical Analysis of Probabilistic Functions of Markov Chains , 1970 .

[3]  A. H. Klopf,et al.  Brain Function and Adaptive Systems: A Heterostatic Theory , 1972 .

[4]  Earl D. Sacerdoti,et al.  Planning in a Hierarchy of Abstraction Spaces , 1974, IJCAI.

[5]  A G Barto,et al.  Toward a modern theory of adaptive networks: expectation and prediction. , 1981, Psychological review.

[6]  Johan de Kleer,et al.  A Qualitative Physics Based on Confluences , 1984, Artif. Intell..

[7]  Richard S. Sutton,et al.  Temporal credit assignment in reinforcement learning , 1984 .

[8]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[9]  Ronald L. Rivest,et al.  Diversity-based inference of finite automata , 1994, 28th Annual Symposium on Foundations of Computer Science (sfcs 1987).

[10]  Geoffrey E. Hinton,et al.  Feudal Reinforcement Learning , 1992, NIPS.

[11]  Lonnie Chrisman,et al.  Reinforcement Learning with Perceptual Aliasing: The Perceptual Distinctions Approach , 1992, AAAI.

[12]  Robert L. Grossman,et al.  Timed Automata , 1999, CAV.

[13]  Michael I. Jordan,et al.  MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES , 1996 .

[14]  Gerald Tesauro,et al.  Temporal Difference Learning and TD-Gammon , 1995, J. Int. Comput. Games Assoc..

[15]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[16]  Richard S. Sutton,et al.  TD Models: Modeling the World at a Mixture of Time Scales , 1995, ICML.

[17]  Ben J. A. Kröse,et al.  Learning from delayed rewards , 1995, Robotics Auton. Syst..

[18]  Andrew G. Barto,et al.  Improving Elevator Performance Using Reinforcement Learning , 1995, NIPS.

[19]  Richard S. Sutton,et al.  Generalization in Reinforcement Learning: Successful Examples Using Sparse Coarse Coding , 1995, NIPS.

[20]  Andrew McCallum,et al.  Reinforcement learning with selective perception and hidden state , 1996 .

[21]  Robert C. Holte,et al.  Speeding up Problem Solving by Abstraction: A Graph Oriented Approach , 1996, Artif. Intell..

[22]  John N. Tsitsiklis,et al.  Analysis of Temporal-Diffference Learning with Function Approximation , 1996, NIPS.

[23]  Dimitri P. Bertsekas,et al.  Reinforcement Learning for Dynamic Channel Allocation in Cellular Telephone Systems , 1996, NIPS.

[24]  Peter Dayan,et al.  A Neural Substrate of Prediction and Reward , 1997, Science.

[25]  Leslie Pack Kaelbling,et al.  Learning Topological Maps with Weak Local Odometric Information , 1997, IJCAI.

[26]  Stuart J. Russell,et al.  Reinforcement Learning with Hierarchies of Machines , 1997, NIPS.

[27]  Doina Precup,et al.  Multi-time Models for Temporally Abstract Planning , 1997, NIPS.

[28]  Ralph Neuneier,et al.  Enhancing Q-Learning for Optimal Asset Allocation , 1997, NIPS.

[29]  Ronald E. Parr,et al.  Hierarchical control and learning for markov decision processes , 1998 .

[30]  Doina Precup,et al.  Intra-Option Learning about Temporally Abstract Actions , 1998, ICML.

[31]  Balaraman Ravindran,et al.  Hierarchical Optimal Control of MDPs , 1998 .

[32]  Balaraman Ravindran,et al.  Improved Switching among Temporally Abstract Actions , 1998, NIPS.

[33]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[34]  S. Haykin,et al.  A Q-learning-based dynamic channel assignment technique for mobile communication systems , 1999 .

[35]  R. Holte,et al.  A symbol's role in learning low-level control functions , 1999 .

[36]  Robert C. Holte,et al.  A Space-Time Tradeoff for Memory-Based Heuristics , 1999, AAAI/IAAI.

[37]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[38]  Thomas G. Dietterich Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..

[39]  Herbert Jaeger,et al.  Observable Operator Models for Discrete Stochastic Time Series , 2000, Neural Computation.

[40]  Doina Precup,et al.  Temporal abstraction in reinforcement learning , 2000, ICML 2000.

[41]  Richard S. Sutton,et al.  Predictive Representations of State , 2001, NIPS.

[42]  Sanjoy Dasgupta,et al.  Off-Policy Temporal Difference Learning with Function Approximation , 2001, ICML.

[43]  Peter Stone,et al.  Scaling Reinforcement Learning toward RoboCup Soccer , 2001, ICML.

[44]  Doina Precup,et al.  A Convergent Form of Approximate Policy Iteration , 2002, NIPS.

[45]  Roland J. Zito-Wolf,et al.  Learning search control knowledge: An explanation-based approach , 1991, Machine Learning.

[46]  Allen Newell,et al.  Chunking in Soar: The anatomy of a general learning mechanism , 1985, Machine Learning.

[47]  Sridhar Mahadevan,et al.  Average reward reinforcement learning: Foundations, algorithms, and empirical results , 2004, Machine Learning.

[48]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.