Sequential Counterfactual Risk Minimization
暂无分享,去创建一个
[1] Antoine Chambaz,et al. Risk Minimization from Adaptively Collected Data: Guarantees for Supervised and Policy Learning , 2021, NeurIPS.
[2] S. Athey,et al. Policy Learning with Adaptively Collected Data , 2021, Management Science.
[3] Alexandre d'Aspremont,et al. Acceleration Methods , 2021, Found. Trends Optim..
[4] Csaba Szepesvári,et al. CoinDICE: Off-Policy Confidence Interval Estimation , 2020, NeurIPS.
[5] Csaba Szepesvari,et al. Bandit Algorithms , 2020 .
[6] Thorsten Joachims,et al. Off-policy Bandits with Deficient Support , 2020, KDD.
[7] J. Mairal,et al. Counterfactual Learning of Stochastic Policies with Continuous Actions: from Models to Offline Evaluation , 2020, 2004.11722.
[8] Yanjun Han,et al. Sequential Batch Learning in Finite-Action Linear Contextual Bandits , 2020, ArXiv.
[9] David Simchi-Levi,et al. Bypassing the Monster: A Faster and Simpler Optimal Algorithm for Contextual Bandits under Realizability , 2020, Math. Oper. Res..
[10] Daniele Calandriello,et al. Near-linear Time Gaussian Process Optimization with Adaptive Batching and Resparsification , 2020, ICML.
[11] Alexander Rakhlin,et al. Beyond UCB: Optimal and Efficient Contextual Bandits with Regression Oracles , 2020, ICML.
[12] Elena Smirnova,et al. Distributionally Robust Counterfactual Risk Minimization , 2019, AAAI.
[13] Vasilis Syrgkanis,et al. Semi-Parametric Efficient Policy Learning with Continuous Actions , 2019, NeurIPS.
[14] Yanjun Han,et al. Batched Multi-armed Bandits Problem , 2019, NeurIPS.
[15] Olivier Wintenberger,et al. Efficient online algorithms for fast-rate regret bounds under sparsity , 2018, NeurIPS.
[16] Nathan Kallus,et al. Policy Evaluation and Optimization with Continuous Treatments , 2018, AISTATS.
[17] Bernhard Schölkopf,et al. Elements of Causal Inference: Foundations and Learning Algorithms , 2017 .
[18] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.
[19] Marc G. Bellemare,et al. Safe and Efficient Off-Policy Reinforcement Learning , 2016, NIPS.
[20] Wouter M. Koolen,et al. MetaGrad: Multiple Learning Rates in Online Learning , 2016, NIPS.
[21] Marc G. Bellemare,et al. Q(λ) with Off-Policy Corrections , 2016, ALT.
[22] Thorsten Joachims,et al. The Self-Normalized Estimator for Counterfactual Learning , 2015, NIPS.
[23] Mark D. Reid,et al. Fast rates in statistical and online learning , 2015, J. Mach. Learn. Res..
[24] Vianney Perchet,et al. Batched Bandit Problems , 2015, COLT.
[25] Michael I. Jordan,et al. Trust Region Policy Optimization , 2015, ICML.
[26] Thorsten Joachims,et al. Counterfactual Risk Minimization , 2015, ICML.
[27] John Langford,et al. Doubly Robust Policy Evaluation and Optimization , 2014, ArXiv.
[28] A. Zeevi,et al. A Linear Response Bandit Problem , 2013 .
[29] Nello Cristianini,et al. Finite-Time Analysis of Kernelised Contextual Bandits , 2013, UAI.
[30] Yurii Nesterov,et al. Gradient methods for minimizing composite functions , 2012, Mathematical Programming.
[31] Joaquin Quiñonero Candela,et al. Counterfactual reasoning and learning systems: the example of computational advertising , 2012, J. Mach. Learn. Res..
[32] Wei Chu,et al. Contextual Bandits with Linear Payoff Functions , 2011, AISTATS.
[33] John Langford,et al. Doubly Robust Policy Evaluation and Learning , 2011, ICML.
[34] Emmanuel J. Candès,et al. Templates for convex cone problems with applications to sparse signal recovery , 2010, Math. Program. Comput..
[35] Wei Chu,et al. A contextual-bandit approach to personalized news article recommendation , 2010, WWW '10.
[36] Massimiliano Pontil,et al. Empirical Bernstein Bounds and Sample-Variance Penalization , 2009, COLT.
[37] Csaba Szepesvári,et al. Tuning Bandit Algorithms in Stochastic Environments , 2007, ALT.
[38] Adrian S. Lewis,et al. The [barred L]ojasiewicz Inequality for Nonsmooth Subanalytic Functions with Applications to Subgradient Dynamical Systems , 2006, SIAM J. Optim..
[39] Stephen P. Boyd,et al. Convex Optimization , 2004, IEEE Transactions on Automatic Control.
[40] Duan Li,et al. On Restart Procedures for the Conjugate Gradient Method , 2004, Numerical Algorithms.
[41] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.
[42] D K Smith,et al. Numerical Optimization , 2001, J. Oper. Res. Soc..
[43] D. Horvitz,et al. A Generalization of Sampling Without Replacement from a Finite Universe , 1952 .
[44] J. Mairal,et al. Efficient Kernelized UCB for Contextual Bandits , 2022, AISTATS.
[45] A. Gleave,et al. Stable-Baselines3: Reliable Reinforcement Learning Implementations , 2021, J. Mach. Learn. Res..
[46] Maximilian Kasy,et al. Supplement for: Adaptive treatment assignment in experiments for policy choice , 2020 .
[47] B. Karrer,et al. AE: A domain-agnostic platform for adaptive experimentation , 2018 .
[48] Y. Nesterov. Gradient methods for minimizing composite functions , 2013, Math. Program..
[49] Eduardo F. Morales,et al. An Introduction to Reinforcement Learning , 2011 .
[50] Thomas G. Dietterich. Adaptive computation and machine learning , 1998 .
[51] S. Łojasiewicz. Sur la géométrie semi- et sous- analytique , 1993 .