Weighted importance sampling for off-policy learning with linear function approximation
暂无分享,去创建一个
Richard S. Sutton | Hado van Hasselt | Ashique Rupam Mahmood | R. Sutton | H. V. Hasselt | A. Mahmood
[1] H. Kahn,et al. Methods of Reducing Sample Size in Monte Carlo Computations , 1953, Oper. Res..
[2] J. Hammersley. SIMULATION AND THE MONTE CARLO METHOD , 1982 .
[3] Sigrún Andradóttir,et al. On the Choice of Alternative Measures in Importance Sampling with Markov Chains , 1995, Oper. Res..
[4] G. Casella,et al. Post-Processing Accept-Reject Samples: Recycling and Rescaling , 1998 .
[5] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .
[6] Justin A. Boyan,et al. Least-Squares Temporal Difference Learning , 1999, ICML.
[7] H. Shimodaira,et al. Improving predictive inference under covariate shift by weighting the log-likelihood function , 2000 .
[8] Doina Precup,et al. Eligibility Traces for Off-Policy Policy Evaluation , 2000, ICML.
[9] Jun S. Liu,et al. Monte Carlo strategies in scientific computing , 2001 .
[10] Sanjoy Dasgupta,et al. Off-Policy Temporal Difference Learning with Function Approximation , 2001, ICML.
[11] Christian R. Shelton,et al. Importance sampling for reinforcement learning with multiple objectives , 2001 .
[12] Christian P. Robert,et al. Monte Carlo Statistical Methods , 2005, Springer Texts in Statistics.
[13] Christian P. Robert,et al. Monte Carlo Statistical Methods (Springer Texts in Statistics) , 2005 .
[14] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[15] Nir Friedman,et al. Probabilistic Graphical Models - Principles and Techniques , 2009 .
[16] D. Bertsekas,et al. Journal of Computational and Applied Mathematics Projected Equation Methods for Approximate Solution of Large Linear Systems , 2022 .
[17] Masashi Sugiyama,et al. Adaptive importance sampling for value function approximation in off-policy reinforcement learning , 2009, Neural Networks.
[18] Huizhen Yu,et al. Convergence of Least Squares Temporal Difference Methods Under General Conditions , 2010, ICML.
[19] Richard S. Sutton,et al. GQ(lambda): A general gradient algorithm for temporal-difference prediction learning with eligibility traces , 2010, Artificial General Intelligence.
[20] R. Sutton,et al. GQ(λ): A general gradient algorithm for temporal-difference prediction learning with eligibility traces , 2010 .
[21] R. Sutton,et al. Gradient temporal-difference learning algorithms , 2011 .
[22] Masashi Sugiyama,et al. Importance-weighted least-squares probabilistic classifier for covariate shift adaptation with application to human activity recognition , 2012, Neurocomputing.
[23] Jan Peters,et al. Policy evaluation with temporal differences: a survey and comparison , 2015, J. Mach. Learn. Res..
[24] Matthieu Geist,et al. Off-policy learning with eligibility traces: a survey , 2013, J. Mach. Learn. Res..
[25] R. Sutton,et al. A new Q ( � ) with interim forward view and Monte Carlo equivalence , 2014 .
[26] Doina Precup,et al. A new Q(lambda) with interim forward view and Monte Carlo equivalence , 2014, ICML.
[27] Luca Martino,et al. Advances in Importance Sampling , 2021, Wiley StatsRef: Statistics Reference Online.