Reinforcement Learning with Factored States and Actions
暂无分享,去创建一个
[1] R. B. Potts. Some generalized order-disorder transformations , 1952, Mathematical Proceedings of the Cambridge Philosophical Society.
[2] N. Metropolis,et al. Equation of State Calculations by Fast Computing Machines , 1953, Resonance.
[3] R. Bellman. A Markovian Decision Process , 1957 .
[4] Ronald A. Howard,et al. Dynamic Programming and Markov Processes , 1960 .
[5] Richard S. Sutton,et al. Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.
[6] Richard S. Sutton,et al. Temporal credit assignment in reinforcement learning , 1984 .
[7] Donald Geman,et al. Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[8] Geoffrey E. Hinton,et al. A Learning Algorithm for Boltzmann Machines , 1985, Cogn. Sci..
[9] D. Rumelhart. Learning internal representations by back-propagating errors , 1986 .
[10] Paul Smolensky,et al. Information processing in dynamical systems: foundations of harmony theory , 1986 .
[11] Geoffrey E. Hinton,et al. Learning representations by back-propagation errors, nature , 1986 .
[12] S. Duane,et al. Hybrid Monte Carlo , 1987 .
[13] G. B. Smith,et al. Preface to S. Geman and D. Geman, “Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images” , 1987 .
[14] Keiji Kanazawa,et al. A model for reasoning about persistence and causation , 1989 .
[15] Michael I. Jordan,et al. Advances in Neural Information Processing Systems 30 , 1995 .
[16] Gregory F. Cooper,et al. The Computational Complexity of Probabilistic Inference Using Bayesian Belief Networks , 1990, Artif. Intell..
[17] Anders Krogh,et al. Introduction to the theory of neural computation , 1994, The advanced book program.
[18] David Haussler,et al. Unsupervised learning of distributions on binary vectors using two layer networks , 1991, NIPS 1991.
[19] W. Lovejoy. A survey of algorithmic methods for partially observed Markov decision processes , 1991 .
[20] Radford M. Neal. Connectionist Learning of Belief Networks , 1992, Artif. Intell..
[21] Leemon C Baird,et al. Reinforcement Learning With High-Dimensional, Continuous Actions , 1993 .
[22] Mahesan Niranjan,et al. On-line Q-learning using connectionist systems , 1994 .
[23] Michael I. Jordan,et al. Reinforcement Learning Algorithm for Partially Observable Markov Decision Problems , 1994, NIPS.
[24] Michael I. Jordan,et al. Reinforcement Learning by Probability Matching , 1995, NIPS 1995.
[25] Richard S. Sutton,et al. Generalization in ReinforcementLearning : Successful Examples UsingSparse Coarse , 1996 .
[26] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[27] Craig Boutilier,et al. Computing Optimal Policies for Partially Observable Decision Processes Using Compact Representations , 1996, AAAI/IAAI, Vol. 2.
[28] Michael I. Jordan,et al. Variational methods for inference and estimation in graphical models , 1997 .
[29] Stuart J. Russell,et al. Reinforcement Learning with Hierarchies of Machines , 1997, NIPS.
[30] Ashwin Ram,et al. Experiments with Reinforcement Learning in Problems with Continuous State and Action Spaces , 1997, Adapt. Behav..
[31] Doina Precup,et al. Theoretical Results on Reinforcement Learning with Temporally Abstract Options , 1998, ECML.
[32] Kee-Eung Kim,et al. Solving Stochastic Planning Problems with Large State and Action Spaces , 1998, AIPS.
[33] Kee-Eung Kim,et al. Solving Very Large Weakly Coupled Markov Decision Processes , 1998, AAAI/IAAI.
[34] Geoffrey E. Hinton,et al. A View of the Em Algorithm that Justifies Incremental, Sparse, and other Variants , 1998, Learning in Graphical Models.
[35] Andrew W. Moore,et al. Gradient Descent for General Reinforcement Learning , 1998, NIPS.
[36] Amy McGovern,et al. AcQuire-macros: An Algorithm for Automatically Learning Macro-actions , 1998 .
[37] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .
[38] Michael I. Jordan,et al. An Introduction to Variational Methods for Graphical Models , 1999, Machine-mediated learning.
[39] Brian Sallans,et al. Learning Factored Representations for Partially Observable Markov Decision Processes , 1999, NIPS.
[40] Leslie Pack Kaelbling,et al. Learning Policies with External Memory , 1999, ICML.
[41] David A. McAllester,et al. Approximate Planning for Factored POMDPs using Belief State Simplification , 1999, UAI.
[42] Daphne Koller,et al. Reinforcement Learning Using Approximate Belief States , 1999, NIPS.
[43] David J. Spiegelhalter,et al. Probabilistic Networks and Expert Systems , 1999, Information Science and Statistics.
[44] Ronen I. Brafman,et al. Reasoning With Conditional Ceteris Paribus Preference Statements , 1999, UAI.
[45] Sebastian Thrun,et al. Monte Carlo POMDPs , 1999, NIPS.
[46] Kee-Eung Kim,et al. Learning to Cooperate via Policy Search , 2000, UAI.
[47] Thomas G. Dietterich. Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..
[48] Craig Boutilier,et al. Stochastic dynamic programming with factored representations , 2000, Artif. Intell..
[49] Craig Boutilier,et al. Vector-space Analysis of Belief-state Approximation for POMDPs , 2001, UAI.
[50] Ronen I. Brafman,et al. UCP-Networks: A Directed Graphical Representation of Conditional Utilities , 2001, UAI.
[51] Geoffrey E. Hinton,et al. Products of Hidden Markov Models , 2001, AISTATS.
[52] Carlos Guestrin,et al. Multiagent Planning with Factored MDPs , 2001, NIPS.
[53] T. Başar,et al. A New Approach to Linear Filtering and Prediction Problems , 2001 .
[54] Geoffrey E. Hinton. Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.
[55] Tommi S. Jaakkola,et al. Convergence Results for Single-Step On-Policy Reinforcement-Learning Algorithms , 2000, Machine Learning.
[56] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[57] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[58] Philipp Slusallek,et al. Introduction to real-time ray tracing , 2005, SIGGRAPH Courses.