Learning for stochastic dynamic programming

We present experimental results about learning function val- ues (i.e. Bellman values) in stochastic dynamic programming (SDP). All results come from openDP (opendp.sourceforge.net), a freely available source code, and therefore can be reproduced. The goal is an independent comparison of learning methods in the framework of SDP.

[1]  Junichiro Yoshimoto,et al.  Application of reinforcement learning to balancing of Acrobot , 1999, IEEE SMC'99 Conference Proceedings. 1999 IEEE International Conference on Systems, Man, and Cybernetics (Cat. No.99CH37028).

[2]  Charles W. Anderson,et al.  Q-Learning with Hidden-Unit Restarting , 1992, NIPS.

[3]  Rémi Munos,et al.  Error Bounds for Approximate Value Iteration , 2005, AAAI.

[4]  Harald Niederreiter,et al.  Random number generation and Quasi-Monte Carlo methods , 1992, CBMS-NSF regional conference series in applied mathematics.

[5]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[6]  Richard S. Sutton,et al.  Generalization in ReinforcementLearning : Successful Examples UsingSparse Coarse , 1996 .

[7]  Andrew G. Barto,et al.  Learning to Act Using Real-Time Dynamic Programming , 1995, Artif. Intell..

[8]  Andrew W. Moore,et al.  Variable Resolution Discretization in Optimal Control , 2002, Machine Learning.

[9]  Peter J. Rousseeuw,et al.  Robust regression and outlier detection , 1987 .

[10]  Luciano Sánchez,et al.  Induction of descriptive fuzzy classifiers with the Logitboost algorithm , 2006, Soft Comput..

[11]  Bernhard Schölkopf,et al.  A tutorial on support vector regression , 2004, Stat. Comput..

[12]  B Ravindran,et al.  A tutorial survey of reinforcement learning , 1994 .

[13]  Doina Precup,et al.  Sparse Distributed Memories for On-Line Value-Based Reinforcement Learning , 2004, ECML.

[14]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[15]  S. Sathiya Keerthi,et al.  Improvements to the SMO algorithm for SVM regression , 2000, IEEE Trans. Neural Networks Learn. Syst..

[16]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.

[17]  Rémi Coulom,et al.  High-accuracy value-function approximation with neural networks applied to the acrobot , 2004, ESANN.

[18]  Doina Precup,et al.  Exponentiated Gradient Methods for Reinforcement Learning , 1997, ICML.

[19]  Ian Witten,et al.  Data Mining , 2000 .

[20]  John G. Cleary,et al.  K*: An Instance-based Learner Using and Entropic Distance Measure , 1995, ICML.

[21]  Rémi Coulom,et al.  Reinforcement Learning Using Neural Networks, with Applications to Motor Control. (Apprentissage par renforcement utilisant des réseaux de neurones, avec des applications au contrôle moteur) , 2002 .

[22]  Charles W. Anderson,et al.  Comparison of CMACs and radial basis functions for local function approximators in reinforcement learning , 1997, Proceedings of International Conference on Neural Networks (ICNN'97).

[23]  Ron Kohavi,et al.  The Power of Decision Tables , 1995, ECML.