Structure in the Space of Value Functions

Solving in an efficient manner many different optimal control tasks within the same underlying environment requires decomposing the environment into its computationally elemental fragments. We suggest how to find fragmentations using unsupervised, mixture model, learning methods on data derived from optimal value functions for multiple tasks, and show that these fragmentations are in accord with observable structure in the environments. Further, we present evidence that such fragments can be of use in a practical reinforcement learning context, by facilitating online, actor-critic learning of multiple goals MDPs.

[1]  Richard Fikes,et al.  Learning and Executing Generalized Robot Plans , 1993, Artif. Intell..

[2]  R. Cox,et al.  Journal of the Royal Statistical Society B , 1972 .

[3]  Austin Tate,et al.  Generating Project Networks , 1977, IJCAI.

[4]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[5]  P. Varaiya,et al.  Multilayer control of large Markov chains , 1978 .

[6]  Edward H. Adelson,et al.  The Laplacian Pyramid as a Compact Image Code , 1983, IEEE Trans. Commun..

[7]  Richard S. Sutton,et al.  Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[8]  Richard E. Korf,et al.  Macro-Operators: A Weak Method for Learning , 1985, Artif. Intell..

[9]  C. Watkins Learning from delayed rewards , 1989 .

[10]  Richard S. Sutton,et al.  Learning and Sequential Decision Making , 1989 .

[11]  D. N. Geary Mixture Models: Inference and Applications to Clustering , 1989 .

[12]  M. Gabriel,et al.  Learning and Computational Neuroscience: Foundations of Adaptive Networks , 1990 .

[13]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[14]  Austin Tate,et al.  O-Plan: The open Planning Architecture , 1991, Artif. Intell..

[15]  Geoffrey E. Hinton,et al.  Feudal Reinforcement Learning , 1992, NIPS.

[16]  Satinder P. Singh,et al.  Reinforcement Learning with a Hierarchy of Abstract Models , 1992, AAAI.

[17]  Geoffrey E. Hinton,et al.  Autoencoders, Minimum Description Length and Helmholtz Free Energy , 1993, NIPS.

[18]  Laurent D. Cohen,et al.  Finite-Element Methods for Active Contour Models and Balloons for 2-D and 3-D Images , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[19]  Mahesan Niranjan,et al.  On-line Q-learning using connectionist systems , 1994 .

[20]  Sebastian Thrun,et al.  Finding Structure in Reinforcement Learning , 1994, NIPS.

[21]  Michael I. Jordan,et al.  Reinforcement Learning with Soft State Aggregation , 1994, NIPS.

[22]  Richard S. Sutton,et al.  TD Models: Modeling the World at a Mixture of Time Scales , 1995, ICML.

[23]  Ben J. A. Kröse,et al.  Learning from delayed rewards , 1995, Robotics Auton. Syst..

[24]  Andrew G. Barto,et al.  Improving Elevator Performance Using Reinforcement Learning , 1995, NIPS.

[25]  Geoffrey J. Gordon Stable Fitted Reinforcement Learning , 1995, NIPS.

[26]  Krzysztof J. Cios,et al.  Advances in neural information processing systems 7 , 1997 .

[27]  Stuart J. Russell,et al.  Reinforcement Learning with Hierarchies of Machines , 1997, NIPS.

[28]  Doina Precup,et al.  Multi-time Models for Temporally Abstract Planning , 1997, NIPS.

[29]  Geoffrey E. Hinton,et al.  Generative models for discovering sparse distributed representations. , 1997, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[30]  Manfred K. Warmuth,et al.  Exponentiated Gradient Versus Gradient Descent for Linear Predictors , 1997, Inf. Comput..

[31]  Brendan J. Frey,et al.  Efficient Stochastic Source Coding and an Application to a Bayesian Network Source Model , 1997, Comput. J..

[32]  Milos Hauskrecht,et al.  Hierarchical Solution of Markov Decision Processes using Macro-actions , 1998, UAI.

[33]  Doina Precup,et al.  Theoretical Results on Reinforcement Learning with Temporally Abstract Options , 1998, ECML.

[34]  Ronald E. Parr,et al.  Hierarchical control and learning for markov decision processes , 1998 .

[35]  Doina Precup,et al.  Between MOPs and Semi-MOP: Learning, Planning & Representing Knowledge at Multiple Temporal Scales , 1998 .

[36]  Jorma Rissanen,et al.  Stochastic Complexity in Statistical Inquiry , 1989, World Scientific Series in Computer Science.

[37]  Thomas G. Dietterich The MAXQ Method for Hierarchical Reinforcement Learning , 1998, ICML.

[38]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[39]  Chris Drummond,et al.  Composing Functions to Speed up Reinforcement Learning in a Changing World , 1998, ECML.

[40]  Craig Boutilier,et al.  Decision-Theoretic Planning: Structural Assumptions and Computational Leverage , 1999, J. Artif. Intell. Res..

[41]  Peter Sollich,et al.  Advances in neural information processing systems 11 , 1999 .

[42]  Andrew W. Moore,et al.  Multi-Value-Functions: Efficient Automatic Action Hierarchies for Multiple Goal MDPs , 1999, IJCAI.

[43]  Zoubin Ghahramani,et al.  Variational Inference for Bayesian Mixtures of Factor Analysers , 1999, NIPS.

[44]  Thomas G. Dietterich Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..

[45]  Thomas G. Dietterich,et al.  Editors. Advances in Neural Information Processing Systems , 2002 .

[46]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.