Representation Discovery for MDPs Using Bisimulation Metrics
暂无分享,去创建一个
Doina Precup | Prakash Panangaden | Gheorghe Comanici | Sherry Shanshan Ruan | Doina Precup | P. Panangaden | S. Ruan | Gheorghe Comanici
[1] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[2] Radha Jagadeesan,et al. Metrics for labelled Markov processes , 2004, Theor. Comput. Sci..
[3] Doina Precup,et al. Bisimulation Metrics are Optimal Value Functions , 2014, UAI.
[4] Dimitri P. Bertsekas,et al. Convergence Results for Some Temporal Difference Methods Based on Least Squares , 2009, IEEE Transactions on Automatic Control.
[5] Lihong Li,et al. Analyzing feature generation for value-function approximation , 2007, ICML '07.
[6] Andrew W. Moore,et al. Generalization in Reinforcement Learning: Safely Approximating the Value Function , 1994, NIPS.
[7] Steven J. Bradtke,et al. Linear Least-Squares algorithms for temporal difference learning , 2004, Machine Learning.
[8] Robert Givan,et al. Equivalence notions and model minimization in Markov decision processes , 2003, Artif. Intell..
[9] Kim G. Larsen,et al. Bisimulation through Probabilistic Testing , 1991, Inf. Comput..
[10] Lihong Li,et al. An analysis of linear models, linear value-function approximation, and feature selection for reinforcement learning , 2008, ICML '08.
[11] C. Atkeson,et al. Prioritized Sweeping : Reinforcement Learning withLess Data and Less Real , 1993 .
[12] Doina Precup,et al. Basis Function Discovery Using Spectral Clustering and Bisimulation Metrics , 2011, AAAI.
[13] Radha Jagadeesan,et al. Metrics for Labeled Markov Systems , 1999, CONCUR.
[14] Doina Precup,et al. On-the-Fly Algorithms for Bisimulation Metrics , 2012, 2012 Ninth International Conference on Quantitative Evaluation of Systems.
[15] Balaraman Ravindran,et al. Model Minimization in Hierarchical Reinforcement Learning , 2002, SARA.
[16] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[17] D. Bertsekas,et al. Adaptive aggregation methods for infinite horizon dynamic programming , 1989 .
[18] Doina Precup,et al. Metrics for Finite Markov Decision Processes , 2004, AAAI.
[19] Andrew W. Moore,et al. Prioritized Sweeping: Reinforcement Learning with Less Data and Less Time , 1993, Machine Learning.