Reinforcement learning for factored Markov decision processes
暂无分享,去创建一个
[1] R. Bellman. A Markovian Decision Process , 1957 .
[2] R. Bellman,et al. Dynamic Programming and Markov Processes , 1960 .
[3] L. Baum,et al. Statistical Inference for Probabilistic Functions of Finite State Markov Chains , 1966 .
[4] L. Baum,et al. A Maximization Technique Occurring in the Statistical Analysis of Probabilistic Functions of Markov Chains , 1970 .
[5] Edward J. Sondik,et al. The Optimal Control of Partially Observable Markov Processes over a Finite Horizon , 1973, Oper. Res..
[6] D. Rubin,et al. Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .
[7] Edward J. Sondik,et al. The Optimal Control of Partially Observable Markov Processes over the Infinite Horizon: Discounted Costs , 1978, Oper. Res..
[8] R. Shumway,et al. AN APPROACH TO TIME SERIES SMOOTHING AND FORECASTING USING THE EM ALGORITHM , 1982 .
[9] Richard S. Sutton,et al. Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.
[10] Donald Geman,et al. Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[11] Geoffrey E. Hinton,et al. A Learning Algorithm for Boltzmann Machines , 1985, Cogn. Sci..
[12] D. Rumelhart. Learning internal representations by back-propagating errors , 1986 .
[13] Paul Smolensky,et al. Information processing in dynamical systems: foundations of harmony theory , 1986 .
[14] Geoffrey E. Hinton,et al. Learning and relearning in Boltzmann machines , 1986 .
[15] Ross D. Shachter. Evaluating Influence Diagrams , 1986, Oper. Res..
[16] Gregory F. Cooper,et al. A Method for Using Belief Networks as Influence Diagrams , 2013, UAI 1988.
[18] C. Watkins. Learning from delayed rewards , 1989 .
[19] Judea Pearl,et al. Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.
[20] Keiji Kanazawa,et al. A model for reasoning about persistence and causation , 1989 .
[21] C. Robert Kenley,et al. Gaussian influence diagrams , 1989 .
[22] Richard S. Sutton,et al. Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.
[23] Gregory F. Cooper,et al. The Computational Complexity of Probabilistic Inference Using Bayesian Belief Networks , 1990, Artif. Intell..
[24] David Haussler,et al. Unsupervised learning of distributions on binary vectors using two layer networks , 1991, NIPS 1991.
[25] W. Lovejoy. A survey of algorithmic methods for partially observed Markov decision processes , 1991 .
[26] P. Dayan. The Convergence of TD(λ) for General λ , 2004, Machine Learning.
[27] Andreas Stolcke,et al. Hidden Markov Model} Induction by Bayesian Model Merging , 1992, NIPS.
[28] Ross D. Shachter,et al. Decision Making Using Probabilistic Inference Methods , 1992, UAI.
[29] Radford M. Neal. Connectionist Learning of Belief Networks , 1992, Artif. Intell..
[30] Lonnie Chrisman,et al. Reinforcement Learning with Perceptual Aliasing: The Perceptual Distinctions Approach , 1992, AAAI.
[31] Holly A. Yanco,et al. An adaptive communication protocol for cooperating mobile robots , 1993 .
[32] Michael Luby,et al. Approximating Probabilistic Inference in Bayesian Belief Networks is NP-Hard , 1993, Artif. Intell..
[33] Leemon C Baird,et al. Reinforcement Learning With High-Dimensional, Continuous Actions , 1993 .
[34] Mahesan Niranjan,et al. On-line Q-learning using connectionist systems , 1994 .
[35] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[36] Solomon Eyal Shimony,et al. Finding MAPs for Belief Networks is NP-Hard , 1994, Artif. Intell..
[37] Michael I. Jordan,et al. Reinforcement Learning Algorithm for Partially Observable Markov Decision Problems , 1994, NIPS.
[38] Jing Peng,et al. Incremental multi-step Q-learning , 1994, Machine-mediated learning.
[39] Kenji Doya,et al. Temporal Difference Learning in Continuous Time and Space , 1995, NIPS.
[40] Stuart J. Russell,et al. Approximating Optimal Policies for Partially Observable Stochastic Domains , 1995, IJCAI.
[41] Michael I. Jordan,et al. Reinforcement Learning by Probability Matching , 1995, NIPS 1995.
[42] Leslie Pack Kaelbling,et al. Learning Policies for Partially Observable Environments: Scaling Up , 1997, ICML.
[43] Richard S. Sutton,et al. Generalization in ReinforcementLearning : Successful Examples UsingSparse Coarse , 1996 .
[44] Geoffrey E. Hinton,et al. Bayesian Learning for Neural Networks , 1995 .
[45] Andrew G. Barto,et al. Learning to Act Using Real-Time Dynamic Programming , 1995, Artif. Intell..
[46] M. Littman,et al. Efficient dynamic-programming updates in partially observable Markov decision processes , 1995 .
[47] Michael I. Jordan,et al. Mean Field Theory for Sigmoid Belief Networks , 1996, J. Artif. Intell. Res..
[48] David J. C. MacKay,et al. BAYESIAN NON-LINEAR MODELING FOR THE PREDICTION COMPETITION , 1996 .
[49] Wenju Liu,et al. Planning in Stochastic Domains: Problem Characteristics and Approximation , 1996 .
[50] Andrew McCallum,et al. Reinforcement learning with selective perception and hidden state , 1996 .
[51] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[52] Craig Boutilier,et al. Computing Optimal Policies for Partially Observable Decision Processes Using Compact Representations , 1996, AAAI/IAAI, Vol. 2.
[53] Craig Boutilier,et al. Approximate Value Trees in Structured Dynamic Programming , 1996, ICML.
[54] Prasad Tadepalli,et al. Scaling Up Average Reward Reinforcement Learning by Approximating the Domain Models and the Value Function , 1996, ICML.
[55] Michael I. Jordan,et al. Variational methods for inference and estimation in graphical models , 1997 .
[56] Stuart J. Russell,et al. Reinforcement Learning with Hierarchies of Machines , 1997, NIPS.
[57] Ashwin Ram,et al. Experiments with Reinforcement Learning in Problems with Continuous State and Action Spaces , 1997, Adapt. Behav..
[58] Geoffrey E. Hinton,et al. Generative models for discovering sparse distributed representations. , 1997, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.
[59] Craig Boutilier,et al. Abstraction and Approximate Decision-Theoretic Planning , 1997, Artif. Intell..
[60] Michael L. Littman,et al. Incremental Pruning: A Simple, Fast, Exact Method for Partially Observable Markov Decision Processes , 1997, UAI.
[61] A. McCallum. Efficient Exploration in Reinforcement Learning with Hidden State , 1997 .
[62] Doina Precup,et al. Theoretical Results on Reinforcement Learning with Temporally Abstract Options , 1998, ECML.
[63] Brian Sallans,et al. A Hierarchical Community of Experts , 1999, Learning in Graphical Models.
[64] Stuart J. Russell. Learning agents for uncertain environments (extended abstract) , 1998, COLT' 98.
[65] Nevin Lianwen Zhang,et al. Probabilistic Inference in Influence Diagrams , 1998, Comput. Intell..
[66] Kee-Eung Kim,et al. Solving Stochastic Planning Problems with Large State and Action Spaces , 1998, AIPS.
[67] Kee-Eung Kim,et al. Solving Very Large Weakly Coupled Markov Decision Processes , 1998, AAAI/IAAI.
[68] Radford M. Neal. Assessing Relevance determination methods using DELVE , 1998 .
[69] Xavier Boyen,et al. Tractable Inference for Complex Stochastic Processes , 1998, UAI.
[70] Eric A. Hansen,et al. Solving POMDPs by Searching in Policy Space , 1998, UAI.
[71] Andrew W. Moore,et al. Gradient Descent for General Reinforcement Learning , 1998, NIPS.
[72] Amy McGovern,et al. AcQuire-macros: An Algorithm for Automatically Learning Macro-actions , 1998 .
[73] A. Cassandra,et al. Exact and approximate algorithms for partially observable markov decision processes , 1998 .
[74] Michael I. Jordan,et al. An Introduction to Variational Methods for Graphical Models , 1999, Machine-mediated learning.
[75] Shin Ishii,et al. Reinforcement Learning Based on On-Line EM Algorithm , 1998, NIPS.
[76] Christopher M. Bishop,et al. Neural networks and machine learning , 1998 .
[77] Mark A. Shayman,et al. Solving POMDP by Onolicy Linear Approximate Learning Algorithm , 1999 .
[78] Brian Sallans,et al. Learning Factored Representations for Partially Observable Markov Decision Processes , 1999, NIPS.
[79] Daphne Koller,et al. Computing Factored Value Functions for Policies in Structured MDPs , 1999, IJCAI.
[80] Leslie Pack Kaelbling,et al. Learning Policies with External Memory , 1999, ICML.
[81] David A. McAllester,et al. Approximate Planning for Factored POMDPs using Belief State Simplification , 1999, UAI.
[82] Geoffrey E. Hinton. Products of experts , 1999 .
[83] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[84] Daphne Koller,et al. Reinforcement Learning Using Approximate Belief States , 1999, NIPS.
[85] Vijay R. Konda,et al. Actor-Critic Algorithms , 1999, NIPS.
[86] Andrew W. Moore,et al. Distributed Value Functions , 1999, ICML.
[87] Sebastian Thrun,et al. Monte Carlo POMDPs , 1999, NIPS.
[88] Andrew Y. Ng,et al. Policy Search via Density Estimation , 1999, NIPS.
[89] Craig Boutilier,et al. Value-Directed Belief State Approximation for POMDPs , 2000, UAI.
[90] Kee-Eung Kim,et al. Learning to Cooperate via Policy Search , 2000, UAI.
[91] Daphne Koller,et al. Policy Iteration for Factored MDPs , 2000, UAI.
[92] Geoffrey E. Hinton,et al. Using Free Energies to Represent Q-values in a Multiagent Reinforcement Learning Task , 2000, NIPS.
[93] Thomas G. Dietterich. Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..
[94] Geoffrey J. Gordon. Reinforcement Learning with Function Approximation Converges to a Region , 2000, NIPS.
[95] Yee Whye Teh,et al. Rate-coded Restricted Boltzmann Machines for Face Recognition , 2000, NIPS.
[96] Craig Boutilier,et al. Stochastic dynamic programming with factored representations , 2000, Artif. Intell..
[97] Jesse Hoey,et al. APRICODD: Approximate Policy Construction Using Decision Diagrams , 2000, NIPS.
[98] Michael I. Jordan,et al. PEGASUS: A policy search method for large MDPs and POMDPs , 2000, UAI.
[99] Prakash P. Shenoy,et al. A Forward Monte Carlo Method For Solving Influence Diagrams Using Local Computation , 2000 .
[100] Katia P. Sycara,et al. Evolutionary Search, Stochastic Policies with Memory, and Reinforcement Learning with Hidden State , 2001, ICML.
[101] C. Lee Giles,et al. How communication can improve the performance of multi-agent systems , 2001, AGENTS '01.
[102] Craig Boutilier,et al. Value-directed sampling methods for monitoring POMDPs , 2001, UAI 2001.
[103] Craig Boutilier,et al. Vector-space Analysis of Belief-state Approximation for POMDPs , 2001, UAI.
[104] Yee Whye Teh,et al. Discovering Multiple Constraints that are Frequently Approximately Satisfied , 2001, UAI.
[105] Terrence J. Sejnowski,et al. Variational Learning for Switching State-Space Models , 2001 .
[106] Geoffrey E. Hinton,et al. Products of Hidden Markov Models , 2001, AISTATS.
[107] Carlos Guestrin,et al. Multiagent Planning with Factored MDPs , 2001, NIPS.
[108] T. Başar,et al. A New Approach to Linear Filtering and Prediction Problems , 2001 .
[109] Simon J. Godsill,et al. Marginal maximum a posteriori estimation using Markov chain Monte Carlo , 2002, Stat. Comput..
[110] Geoffrey E. Hinton. Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.
[111] Peter Dayan,et al. Q-learning , 1992, Machine Learning.
[112] Peter Dayan,et al. Analytical Mean Squared Error Curves for Temporal Difference Learning , 1996, Machine Learning.
[113] Richard S. Sutton,et al. Reinforcement learning with replacing eligibility traces , 2004, Machine Learning.
[114] Michael I. Jordan,et al. Factorial Hidden Markov Models , 1995, Machine Learning.
[115] R. J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[116] Sean R Eddy,et al. What is dynamic programming? , 2004, Nature Biotechnology.
[117] R. Dearden. Structured Prioritized Sweeping , 2022 .