暂无分享,去创建一个
[1] Xinkun Nie,et al. Why adaptively collected data have negative bias and how to correct for it , 2017, AISTATS.
[2] Benjamin Van Roy,et al. Learning to Optimize via Posterior Sampling , 2013, Math. Oper. Res..
[3] Ambuj Tewari,et al. An Actor-Critic Contextual Bandit Algorithm for Personalized Mobile Health Interventions , 2017, ArXiv.
[4] Nathan Kallus,et al. Balanced Policy Evaluation and Learning , 2017, NeurIPS.
[5] Tor Lattimore,et al. Causal Bandits: Learning Good Interventions via Causal Inference , 2016, NIPS.
[6] Peter Auer,et al. Using Confidence Bounds for Exploitation-Exploration Trade-offs , 2003, J. Mach. Learn. Res..
[7] Steven L. Scott,et al. A modern Bayesian look at the multi-armed bandit , 2010 .
[8] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.
[9] Benjamin Van Roy,et al. A Tutorial on Thompson Sampling , 2017, Found. Trends Mach. Learn..
[10] John Langford,et al. Doubly Robust Policy Evaluation and Learning , 2011, ICML.
[11] John Langford,et al. Making Contextual Decisions with Low Technical Debt , 2016 .
[12] Elias Bareinboim,et al. Fairness in Decision-Making - The Causal Explanation Formula , 2018, AAAI.
[13] Wei Chu,et al. An unbiased offline evaluation of contextual bandit algorithms with generalized linear models , 2011 .
[14] Susan Athey,et al. Recursive partitioning for heterogeneous causal effects , 2015, Proceedings of the National Academy of Sciences.
[15] John Langford,et al. Doubly Robust Policy Evaluation and Optimization , 2014, ArXiv.
[16] Elias Bareinboim,et al. Bandits with Unobserved Confounders: A Causal Approach , 2015, NIPS.
[17] Lihong Li,et al. Provable Optimal Algorithms for Generalized Linear Contextual Bandits , 2017, ArXiv.
[18] John Langford,et al. Practical Evaluation and Optimization of Contextual Bandit Algorithms , 2018, ArXiv.
[19] Stefan Wager,et al. Efficient Policy Learning , 2017, ArXiv.
[20] Liang Tang,et al. Personalized Recommendation via Parameter-Free Contextual Bandits , 2015, SIGIR.
[21] Leo Breiman,et al. Random Forests , 2001, Machine Learning.
[22] Lihong Li,et al. Counterfactual Estimation and Optimization of Click Metrics for Search Engines , 2014, ArXiv.
[23] Mohsen Bayati,et al. Online Decision-Making with High-Dimensional Covariates , 2015 .
[24] Shipra Agrawal,et al. Analysis of Thompson Sampling for the Multi-armed Bandit Problem , 2011, COLT.
[25] Vianney Perchet,et al. The multi-armed bandit problem with covariates , 2011, ArXiv.
[26] Elias Bareinboim,et al. Causal inference and the data-fusion problem , 2016, Proceedings of the National Academy of Sciences.
[27] Nan Jiang,et al. Doubly Robust Off-policy Value Evaluation for Reinforcement Learning , 2015, ICML.
[28] Lihong Li,et al. Learning from Logged Implicit Exploration Data , 2010, NIPS.
[29] Jack Bowden,et al. Multi-armed Bandit Models for the Optimal Design of Clinical Trials: Benefits and Challenges. , 2015, Statistical science : a review journal of the Institute of Mathematical Statistics.
[30] S. Athey,et al. Generalized random forests , 2016, The Annals of Statistics.
[31] R. Tibshirani. Regression Shrinkage and Selection via the Lasso , 1996 .
[32] A. Zeevi,et al. A Linear Response Bandit Problem , 2013 .
[33] A. E. Hoerl,et al. Ridge Regression: Applications to Nonorthogonal Problems , 1970 .
[34] Elias Bareinboim,et al. Counterfactual Data-Fusion for Online Reinforcement Learners , 2017, ICML.
[35] Lihong Li,et al. An Empirical Evaluation of Thompson Sampling , 2011, NIPS.
[36] T. L. Lai Andherbertrobbins. Asymptotically Efficient Adaptive Allocation Rules , 2022 .
[37] John Langford,et al. Taming the Monster: A Fast and Simple Algorithm for Contextual Bandits , 2014, ICML.
[38] Wei Chu,et al. A contextual-bandit approach to personalized news article recommendation , 2010, WWW '10.
[39] Philippe Rigollet,et al. Nonparametric Bandits with Covariates , 2010, COLT.
[40] Sébastien Bubeck,et al. Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems , 2012, Found. Trends Mach. Learn..
[41] Shipra Agrawal,et al. Thompson Sampling for Contextual Bandits with Linear Payoffs , 2012, ICML.
[42] Aleksandrs Slivkins,et al. Contextual Bandits with Similarity Information , 2009, COLT.
[43] Vasilis Syrgkanis,et al. Accurate Inference for Adaptive Linear Models , 2017, ICML.
[44] Philip S. Thomas,et al. Data-Efficient Off-Policy Policy Evaluation for Reinforcement Learning , 2016, ICML.
[45] Thorsten Joachims,et al. Batch learning from logged bandit feedback through counterfactual risk minimization , 2015, J. Mach. Learn. Res..
[46] Raphaël Féraud,et al. Random Forest for the Contextual Bandit Problem , 2015, AISTATS.
[47] W. R. Thompson. ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .