Research Grant Renewal Proposal Reinforcement Learning and Artificial Intelligence chair :
暂无分享,去创建一个
[1] Gerald Tesauro,et al. Temporal Difference Learning and TD-Gammon , 1995, J. Int. Comput. Games Assoc..
[2] Doina Precup,et al. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..
[3] Geoffrey E. Hinton,et al. Learning internal representations by error propagation , 1986 .
[4] Peter Dayan,et al. A Neural Substrate of Prediction and Reward , 1997, Science.
[5] Shalabh Bhatnagar,et al. Incremental Natural Actor-Critic Algorithms , 2007, NIPS.
[6] Ronald L. Rivest,et al. Diversity-based inference of finite automata , 1994, 28th Annual Symposium on Foundations of Computer Science (sfcs 1987).
[7] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.
[8] JORMA RISSANEN,et al. A universal data compression system , 1983, IEEE Trans. Inf. Theory.
[9] Richard S. Sutton,et al. Temporal credit assignment in reinforcement learning , 1984 .
[10] Leslie Pack Kaelbling,et al. Learning Topological Maps with Weak Local Odometric Information , 1997, IJCAI.
[11] Satinder P. Singh,et al. Predictive linear-Gaussian models of controlled stochastic dynamical systems , 2006, ICML.
[12] Ronald E. Parr,et al. Hierarchical control and learning for markov decision processes , 1998 .
[13] L. Baum,et al. A Maximization Technique Occurring in the Statistical Analysis of Probabilistic Functions of Markov Chains , 1970 .
[14] Michael R. James,et al. Learning and discovery of predictive state representations in dynamical systems with reset , 2004, ICML.
[15] Doina Precup,et al. Off-policy Learning with Options and Recognizers , 2005, NIPS.
[16] Steven J. Bradtke,et al. Linear Least-Squares algorithms for temporal difference learning , 2004, Machine Learning.
[17] Richard S. Sutton,et al. Predictive Representations of State , 2001, NIPS.
[18] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[19] Thomas G. Dietterich. Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..
[20] Apostolos Burnetas,et al. Optimal Adaptive Policies for Markov Decision Processes , 1997, Math. Oper. Res..
[21] Satinder P. Singh,et al. Predictive state representations with options , 2006, ICML.
[22] Stuart J. Russell,et al. Reinforcement Learning with Hierarchies of Machines , 1997, NIPS.
[23] Nathan R. Sturtevant,et al. Feature Construction for Reinforcement Learning in Hearts , 2006, Computers and Games.
[24] Ben J. A. Kröse,et al. Learning from delayed rewards , 1995, Robotics Auton. Syst..
[25] T. L. Graves,et al. Asymptotically Efficient Adaptive Choice of Control Laws inControlled Markov Chains , 1997 .
[26] Jeff G. Schneider,et al. Covariant Policy Search , 2003, IJCAI.
[27] Andrew McCallum,et al. Reinforcement learning with selective perception and hidden state , 1996 .
[28] A G Barto,et al. Toward a modern theory of adaptive networks: expectation and prediction. , 1981, Psychological review.
[29] Michael H. Bowling,et al. Learning predictive state representations using non-blind policies , 2006, ICML '06.
[30] S. Haykin,et al. A Q-learning-based dynamic channel assignment technique for mobile communication systems , 1999 .
[31] Herbert Jaeger,et al. Observable Operator Models for Discrete Stochastic Time Series , 2000, Neural Computation.
[32] Eric Wiewiora,et al. Learning predictive representations from a history , 2005, ICML.
[33] Richard S. Sutton,et al. Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.
[34] Satinder P. Singh,et al. On discovery and learning of models with predictive representations of state for agents with continuous actions and observations , 2007, AAMAS '07.
[35] Ralph Neuneier,et al. Enhancing Q-Learning for Optimal Asset Allocation , 1997, NIPS.
[36] Peter Auer,et al. Logarithmic Online Regret Bounds for Undiscounted Reinforcement Learning , 2006, NIPS.
[37] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[38] Doina Precup,et al. Temporal abstraction in reinforcement learning , 2000, ICML 2000.
[39] Lonnie Chrisman,et al. Reinforcement Learning with Perceptual Aliasing: The Perceptual Distinctions Approach , 1992, AAAI.
[40] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[41] Andrew G. Barto,et al. Improving Elevator Performance Using Reinforcement Learning , 1995, NIPS.
[42] Vivek S. Borkar,et al. Actor-Critic - Type Learning Algorithms for Markov Decision Processes , 1999, SIAM J. Control. Optim..
[43] Vijay R. Konda,et al. Actor-Critic Algorithms , 1999, NIPS.
[44] Sham M. Kakade,et al. A Natural Policy Gradient , 2001, NIPS.
[45] Ran El-Yaniv,et al. On Prediction Using Variable Order Markov Models , 2004, J. Artif. Intell. Res..
[46] Alborz Geramifard,et al. Incremental Least-Squares Temporal Difference Learning , 2006, AAAI.
[47] Richard S. Sutton,et al. Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.
[48] Vishal Soni,et al. Relational Knowledge with Predictive State Representations , 2007, IJCAI.
[49] Michail G. Lagoudakis,et al. Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..
[50] R.M. Dunn,et al. Brains, behavior, and robotics , 1983, Proceedings of the IEEE.
[51] Tao Wang,et al. Dual Representations for Dynamic Programming and Reinforcement Learning , 2007, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning.
[52] Abhijit Gosavi,et al. Self-Improving Factory Simulation using Continuous-time Average-Reward Reinforcement Learning , 2007 .
[53] Shalabh Bhatnagar,et al. A simultaneous perturbation stochastic approximation-based actor-critic algorithm for Markov decision processes , 2004, IEEE Transactions on Automatic Control.
[54] Jing Peng,et al. Efficient Learning and Planning Within the Dyna Framework , 1993, Adapt. Behav..
[55] Richard S. Sutton,et al. On the role of tracking in stationary environments , 2007, ICML '07.
[56] Richard S. Sutton,et al. Generalization in Reinforcement Learning: Successful Examples Using Sparse Coarse Coding , 1995, NIPS.
[57] Peter Marbach,et al. Simulation-based optimization of Markov decision processes , 1998 .
[58] Justin A. Boyan,et al. Technical Update: Least-Squares Temporal Difference Learning , 2002, Machine Learning.
[59] Satinder P. Singh,et al. Kernel Predictive Linear Gaussian models for nonlinear stochastic dynamical systems , 2006, ICML.
[60] Richard S. Sutton,et al. Reinforcement Learning of Local Shape in the Game of Go , 2007, IJCAI.
[61] Sanjoy Dasgupta,et al. Off-Policy Temporal Difference Learning with Function Approximation , 2001, ICML.
[62] Csaba Szepesvári,et al. Bandit Based Monte-Carlo Planning , 2006, ECML.
[63] Richard S. Sutton,et al. Temporal-Difference Networks , 2004, NIPS.
[64] Dimitri P. Bertsekas,et al. Reinforcement Learning for Dynamic Channel Allocation in Cellular Telephone Systems , 1996, NIPS.
[65] Stefan Schaal,et al. Natural Actor-Critic , 2003, Neurocomputing.
[66] R. Agrawal,et al. Asymptotically efficient adaptive allocation schemes for controlled Markov chains: finite parameter space , 1989 .
[67] N. Metropolis,et al. Equation of State Calculations by Fast Computing Machines , 1953, Resonance.
[68] Shalabh Bhatnagar,et al. Reinforcement Learning Based Algorithms for Average Cost Markov Decision Processes , 2007, Discret. Event Dyn. Syst..
[69] S. Yakowitz,et al. Machine learning and nonparametric bandit theory , 1995, IEEE Trans. Autom. Control..
[70] Richard S. Sutton,et al. Temporal Abstraction in Temporal-difference Networks , 2005, NIPS.