Surrogate Objectives for Batch Policy Optimization in One-step Decision Making
暂无分享,去创建一个
Minmin Chen | Dale Schuurmans | Christopher K. Harris | Ramki Gummadi | D. Schuurmans | Chris Harris | Minmin Chen | R. Gummadi
[1] Sergey Levine,et al. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.
[2] Philip S. Thomas,et al. Data-Efficient Off-Policy Policy Evaluation for Reinforcement Learning , 2016, ICML.
[3] Thorsten Joachims,et al. Recommendations as Treatments: Debiasing Learning and Evaluation , 2016, ICML.
[4] Ambuj Tewari,et al. On the Consistency of Multiclass Classification Methods , 2007, J. Mach. Learn. Res..
[5] M. de Rijke,et al. Deep Learning with Logged Bandit Feedback , 2018, ICLR.
[6] Ingo Steinwart. How to Compare Different Loss Functions and Their Risks , 2007 .
[7] Lorenzo Rosasco,et al. Are Loss Functions All the Same? , 2004, Neural Computation.
[8] Geoffrey E. Hinton,et al. On the importance of initialization and momentum in deep learning , 2013, ICML.
[9] Yu-Xiang Wang,et al. Imitation-Regularized Offline Learning , 2019, AISTATS.
[10] Dale Schuurmans,et al. Bridging the Gap Between Value and Policy Based Reinforcement Learning , 2017, NIPS.
[11] John Langford,et al. Doubly Robust Policy Evaluation and Optimization , 2014, ArXiv.
[12] Thorsten Joachims,et al. Batch learning from logged bandit feedback through counterfactual risk minimization , 2015, J. Mach. Learn. Res..
[13] Ed H. Chi,et al. Top-K Off-Policy Correction for a REINFORCE Recommender System , 2018, WSDM.
[14] Dale Schuurmans,et al. Trust-PCL: An Off-Policy Trust Region Method for Continuous Control , 2017, ICLR.
[15] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .
[16] John Langford,et al. Doubly Robust Policy Evaluation and Learning , 2011, ICML.
[17] Csaba Szepesvári,et al. Cost-sensitive Multiclass Classification Risk Bounds , 2013, ICML.
[18] Maya R. Gupta,et al. Cost-sensitive multi-class classification from probability estimates , 2008, ICML '08.
[19] Hsuan-Tien Lin. Reduction from Cost-Sensitive Multiclass Classification to One-versus-One Binary Classification , 2014, ACML.
[20] Hans Ulrich Simon,et al. Robust Trainability of Single Neurons , 1995, J. Comput. Syst. Sci..
[21] Michael I. Jordan,et al. Convexity, Classification, and Risk Bounds , 2006 .
[22] Miroslav Dudík,et al. Optimal and Adaptive Off-policy Evaluation in Contextual Bandits , 2016, ICML.
[23] Geoffrey E. Hinton,et al. Regularizing Neural Networks by Penalizing Confident Output Distributions , 2017, ICLR.
[24] Charles Elkan,et al. The Foundations of Cost-Sensitive Learning , 2001, IJCAI.
[25] Thorsten Joachims,et al. The Self-Normalized Estimator for Counterfactual Learning , 2015, NIPS.
[26] Zhengyuan Zhou,et al. Offline Multi-Action Policy Learning: Generalization and Optimization , 2018, Oper. Res..
[27] Lucas C. Parra,et al. Maximum Likelihood in Cost-Sensitive Learning: Model Specification, Approximations, and Upper Bounds , 2010, J. Mach. Learn. Res..
[28] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[29] Samy Bengio,et al. Tensor2Tensor for Neural Machine Translation , 2018, AMTA.
[30] Stefan Riezler,et al. Counterfactual Learning from Bandit Feedback under Deterministic Logging : A Case Study in Statistical Machine Translation , 2017, EMNLP.
[31] Mark D. Reid,et al. Information, Divergence and Risk for Binary Experiments , 2009, J. Mach. Learn. Res..
[32] Mehrdad Farajtabar,et al. More Robust Doubly Robust Off-policy Evaluation , 2018, ICML.
[33] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[34] Thorsten Joachims,et al. Counterfactual Risk Minimization: Learning from Logged Bandit Feedback , 2015, ICML.
[35] Doina Precup,et al. Off-Policy Deep Reinforcement Learning without Exploration , 2018, ICML.
[36] A. Preliminaries. Data-Efficient Off-Policy Policy Evaluation for Reinforcement Learning , 2016 .
[37] Jin Tian,et al. Graphical Models for Inference with Missing Data , 2013, NIPS.
[38] M. de Rijke,et al. Large-scale Validation of Counterfactual Learning Methods: A Test-Bed , 2016, ArXiv.
[39] Tong Zhang,et al. Statistical Analysis of Some Multi-Category Large Margin Classification Methods , 2004, J. Mach. Learn. Res..
[40] John Langford,et al. An iterative method for multi-class cost-sensitive learning , 2004, KDD.
[41] Joaquin Quiñonero Candela,et al. Counterfactual reasoning and learning systems: the example of computational advertising , 2013, J. Mach. Learn. Res..
[42] Stefan Riezler,et al. Improving a Neural Semantic Parser by Counterfactual Learning from Human Bandit Feedback , 2018, ACL.
[43] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.
[44] Yuan Zhou,et al. Off-Policy Evaluation and Learning from Logged Bandit Feedback: Error Reduction via Surrogate Policy , 2019, ICLR.