暂无分享,去创建一个
[1] C. Glymour,et al. STATISTICS AND CAUSAL INFERENCE , 1985 .
[2] J. Robins,et al. Semiparametric regression estimation in the presence of dependent censoring , 1995 .
[3] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[4] Doina Precup,et al. Intra-Option Learning about Temporally Abstract Actions , 1998, ICML.
[5] G. Imbens,et al. Efficient Estimation of Average Treatment Effects Using the Estimated Propensity Score , 2000 .
[6] J. Pearl. Causality: Models, Reasoning and Inference , 2000 .
[7] Doina Precup,et al. Eligibility Traces for Off-Policy Policy Evaluation , 2000, ICML.
[8] Doina Precup,et al. Temporal abstraction in reinforcement learning , 2000, ICML 2000.
[9] J M Robins,et al. Marginal Mean Models for Dynamic Regimes , 2001, Journal of the American Statistical Association.
[10] Sanjoy Dasgupta,et al. Off-Policy Temporal Difference Learning with Function Approximation , 2001, ICML.
[11] G. Imbens,et al. Efficient Estimation of Average Treatment Effects Using the Estimated Propensity Score , 2002 .
[12] John Langford,et al. Approximately Optimal Approximate Reinforcement Learning , 2002, ICML.
[13] Naoki Abe,et al. Sequential cost-sensitive decision making with reinforcement learning , 2002, KDD.
[14] Richard S. Sutton,et al. Reinforcement learning with replacing eligibility traces , 2004, Machine Learning.
[15] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[16] John N. Tsitsiklis,et al. Bias and Variance Approximation in Value Function Estimates , 2007, Manag. Sci..
[17] Peter Stone,et al. Model-based function approximation in reinforcement learning , 2007, AAMAS '07.
[18] T. Moore. A Theory of Cramer-Rao Bounds for Constrained Parametric Models , 2010 .
[19] John Langford,et al. Doubly Robust Policy Evaluation and Learning , 2011, ICML.
[20] Wei Chu,et al. Unbiased offline evaluation of contextual-bandit-based news article recommendation algorithms , 2010, WSDM '11.
[21] Guy Lever,et al. Modelling transition dynamics in MDPs with RKHS embeddings , 2012, ICML.
[22] Louis Wehenkel,et al. Batch mode reinforcement learning based on the synthesis of artificial trajectories , 2013, Ann. Oper. Res..
[23] Joaquin Quiñonero Candela,et al. Counterfactual reasoning and learning systems: the example of computational advertising , 2013, J. Mach. Learn. Res..
[24] Daniele Calandriello,et al. Safe Policy Iteration , 2013, ICML.
[25] Jan Peters,et al. Policy evaluation with temporal differences: a survey and comparison , 2015, J. Mach. Learn. Res..
[26] Philip S. Thomas,et al. High Confidence Policy Improvement , 2015, ICML.
[27] Philip S. Thomas,et al. High-Confidence Off-Policy Evaluation , 2015, AAAI.
[28] Philip S. Thomas,et al. Safe Reinforcement Learning , 2015 .
[29] Martha White,et al. An Emphatic Approach to the Problem of Off-policy Temporal-Difference Learning , 2015, J. Mach. Learn. Res..
[30] Dirk Ormoneit,et al. Kernel-Based Reinforcement Learning , 2017, Encyclopedia of Machine Learning and Data Mining.