Using Free Energies to Represent Q-values in a Multiagent Reinforcement Learning Task
暂无分享,去创建一个
[1] Brian Sallans,et al. Learning Factored Representations for Partially Observable Markov Decision Processes , 1999, NIPS.
[2] Michael I. Jordan,et al. Reinforcement Learning Algorithm for Partially Observable Markov Decision Problems , 1994, NIPS.
[3] Geoffrey E. Hinton. Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.
[4] Radford M. Neal. Connectionist Learning of Belief Networks , 1992, Artif. Intell..
[5] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .
[6] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[7] Donald Geman,et al. Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[8] Richard S. Sutton,et al. Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.
[9] Richard S. Sutton,et al. Generalization in ReinforcementLearning : Successful Examples UsingSparse Coarse , 1996 .
[10] Leslie Pack Kaelbling,et al. Learning Policies for Partially Observable Environments: Scaling Up , 1997, ICML.
[11] Michael I. Jordan,et al. Advances in Neural Information Processing Systems 30 , 1995 .
[12] Michael I. Jordan,et al. Reinforcement Learning by Probability Matching , 1995, NIPS 1995.
[13] Richard S. Sutton,et al. Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.
[14] Michael I. Jordan,et al. Variational methods for inference and estimation in graphical models , 1997 .
[15] G. B. Smith,et al. Preface to S. Geman and D. Geman, “Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images” , 1987 .
[16] Mahesan Niranjan,et al. On-line Q-learning using connectionist systems , 1994 .