iCORE Research Grant Proposal Reinforcement Learning and Artificial Intelligence
暂无分享,去创建一个
[1] Arthur L. Samuel,et al. Some Studies in Machine Learning Using the Game of Checkers , 1967, IBM J. Res. Dev..
[2] L. Baum,et al. A Maximization Technique Occurring in the Statistical Analysis of Probabilistic Functions of Markov Chains , 1970 .
[3] A. H. Klopf,et al. Brain Function and Adaptive Systems: A Heterostatic Theory , 1972 .
[4] Earl D. Sacerdoti,et al. Planning in a Hierarchy of Abstraction Spaces , 1974, IJCAI.
[5] A G Barto,et al. Toward a modern theory of adaptive networks: expectation and prediction. , 1981, Psychological review.
[6] Johan de Kleer,et al. A Qualitative Physics Based on Confluences , 1984, Artif. Intell..
[7] Richard S. Sutton,et al. Temporal credit assignment in reinforcement learning , 1984 .
[8] Geoffrey E. Hinton,et al. Learning internal representations by error propagation , 1986 .
[9] Ronald L. Rivest,et al. Diversity-based inference of finite automata , 1994, 28th Annual Symposium on Foundations of Computer Science (sfcs 1987).
[10] Geoffrey E. Hinton,et al. Feudal Reinforcement Learning , 1992, NIPS.
[11] Lonnie Chrisman,et al. Reinforcement Learning with Perceptual Aliasing: The Perceptual Distinctions Approach , 1992, AAAI.
[12] Robert L. Grossman,et al. Timed Automata , 1999, CAV.
[13] Michael I. Jordan,et al. MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES , 1996 .
[14] Gerald Tesauro,et al. Temporal Difference Learning and TD-Gammon , 1995, J. Int. Comput. Games Assoc..
[15] Peter Norvig,et al. Artificial Intelligence: A Modern Approach , 1995 .
[16] Richard S. Sutton,et al. TD Models: Modeling the World at a Mixture of Time Scales , 1995, ICML.
[17] Ben J. A. Kröse,et al. Learning from delayed rewards , 1995, Robotics Auton. Syst..
[18] Andrew G. Barto,et al. Improving Elevator Performance Using Reinforcement Learning , 1995, NIPS.
[19] Richard S. Sutton,et al. Generalization in Reinforcement Learning: Successful Examples Using Sparse Coarse Coding , 1995, NIPS.
[20] Andrew McCallum,et al. Reinforcement learning with selective perception and hidden state , 1996 .
[21] Robert C. Holte,et al. Speeding up Problem Solving by Abstraction: A Graph Oriented Approach , 1996, Artif. Intell..
[22] John N. Tsitsiklis,et al. Analysis of Temporal-Diffference Learning with Function Approximation , 1996, NIPS.
[23] Dimitri P. Bertsekas,et al. Reinforcement Learning for Dynamic Channel Allocation in Cellular Telephone Systems , 1996, NIPS.
[24] Peter Dayan,et al. A Neural Substrate of Prediction and Reward , 1997, Science.
[25] Leslie Pack Kaelbling,et al. Learning Topological Maps with Weak Local Odometric Information , 1997, IJCAI.
[26] Stuart J. Russell,et al. Reinforcement Learning with Hierarchies of Machines , 1997, NIPS.
[27] Doina Precup,et al. Multi-time Models for Temporally Abstract Planning , 1997, NIPS.
[28] Ralph Neuneier,et al. Enhancing Q-Learning for Optimal Asset Allocation , 1997, NIPS.
[29] Ronald E. Parr,et al. Hierarchical control and learning for markov decision processes , 1998 .
[30] Doina Precup,et al. Intra-Option Learning about Temporally Abstract Actions , 1998, ICML.
[31] Balaraman Ravindran,et al. Hierarchical Optimal Control of MDPs , 1998 .
[32] Balaraman Ravindran,et al. Improved Switching among Temporally Abstract Actions , 1998, NIPS.
[33] Doina Precup,et al. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..
[34] S. Haykin,et al. A Q-learning-based dynamic channel assignment technique for mobile communication systems , 1999 .
[35] R. Holte,et al. A symbol's role in learning low-level control functions , 1999 .
[36] Robert C. Holte,et al. A Space-Time Tradeoff for Memory-Based Heuristics , 1999, AAAI/IAAI.
[37] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[38] Thomas G. Dietterich. Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..
[39] Herbert Jaeger,et al. Observable Operator Models for Discrete Stochastic Time Series , 2000, Neural Computation.
[40] Doina Precup,et al. Temporal abstraction in reinforcement learning , 2000, ICML 2000.
[41] Richard S. Sutton,et al. Predictive Representations of State , 2001, NIPS.
[42] Sanjoy Dasgupta,et al. Off-Policy Temporal Difference Learning with Function Approximation , 2001, ICML.
[43] Peter Stone,et al. Scaling Reinforcement Learning toward RoboCup Soccer , 2001, ICML.
[44] Doina Precup,et al. A Convergent Form of Approximate Policy Iteration , 2002, NIPS.
[45] Roland J. Zito-Wolf,et al. Learning search control knowledge: An explanation-based approach , 1991, Machine Learning.
[46] Allen Newell,et al. Chunking in Soar: The anatomy of a general learning mechanism , 1985, Machine Learning.
[47] Sridhar Mahadevan,et al. Average reward reinforcement learning: Foundations, algorithms, and empirical results , 2004, Machine Learning.
[48] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.