论文信息 - Learning for stochastic dynamic programming

Learning for stochastic dynamic programming

We present experimental results about learning function val- ues (i.e. Bellman values) in stochastic dynamic programming (SDP). All results come from openDP (opendp.sourceforge.net), a freely available source code, and therefore can be reproduced. The goal is an independent comparison of learning methods in the framework of SDP.

[1] Junichiro Yoshimoto,et al. Application of reinforcement learning to balancing of Acrobot , 1999, IEEE SMC'99 Conference Proceedings. 1999 IEEE International Conference on Systems, Man, and Cybernetics (Cat. No.99CH37028).

[2] Charles W. Anderson,et al. Q-Learning with Hidden-Unit Restarting , 1992, NIPS.

[3] Rémi Munos,et al. Error Bounds for Approximate Value Iteration , 2005, AAAI.

[4] Harald Niederreiter,et al. Random number generation and Quasi-Monte Carlo methods , 1992, CBMS-NSF regional conference series in applied mathematics.

[5] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[6] Richard S. Sutton,et al. Generalization in ReinforcementLearning : Successful Examples UsingSparse Coarse , 1996 .

[7] Andrew G. Barto,et al. Learning to Act Using Real-Time Dynamic Programming , 1995, Artif. Intell..

[8] Andrew W. Moore,et al. Variable Resolution Discretization in Optimal Control , 2002, Machine Learning.

[9] Peter J. Rousseeuw,et al. Robust regression and outlier detection , 1987 .

[10] Luciano Sánchez,et al. Induction of descriptive fuzzy classifiers with the Logitboost algorithm , 2006, Soft Comput..

[11] Bernhard Schölkopf,et al. A tutorial on support vector regression , 2004, Stat. Comput..

[12] B Ravindran,et al. A tutorial survey of reinforcement learning , 1994 .

[13] Doina Precup,et al. Sparse Distributed Memories for On-Line Value-Based Reinforcement Learning , 2004, ECML.

[14] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[15] S. Sathiya Keerthi,et al. Improvements to the SMO algorithm for SVM regression , 2000, IEEE Trans. Neural Networks Learn. Syst..

[16] Ian H. Witten,et al. Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.

[17] Rémi Coulom,et al. High-accuracy value-function approximation with neural networks applied to the acrobot , 2004, ESANN.

[18] Doina Precup,et al. Exponentiated Gradient Methods for Reinforcement Learning , 1997, ICML.

[19] Ian Witten,et al. Data Mining , 2000 .

[20] John G. Cleary,et al. K*: An Instance-based Learner Using and Entropic Distance Measure , 1995, ICML.

[21] Rémi Coulom,et al. Reinforcement Learning Using Neural Networks, with Applications to Motor Control. (Apprentissage par renforcement utilisant des réseaux de neurones, avec des applications au contrôle moteur) , 2002 .

[22] Charles W. Anderson,et al. Comparison of CMACs and radial basis functions for local function approximators in reinforcement learning , 1997, Proceedings of International Conference on Neural Networks (ICNN'97).

[23] Ron Kohavi,et al. The Power of Decision Tables , 1995, ECML.