暂无分享,去创建一个
Ambuj Tewari | Ziping Xu | Yangyi Lu | Ambuj Tewari | Ziping Xu | Yangyi Lu
[1] John Langford,et al. Doubly Robust Policy Evaluation and Learning , 2011, ICML.
[2] Y. Narahari,et al. Achieving Fairness in the Stochastic Multi-armed Bandit Problem , 2019, AAAI.
[3] Alessandro Lazaric,et al. Improved Algorithms for Conservative Exploration in Bandits , 2020, AAAI.
[4] B. Chakraborty,et al. mHealth app using machine learning to increase physical activity in diabetes and depression: clinical trial protocol for the DIAMANTE Study , 2020, BMJ Open.
[5] John Langford,et al. The Epoch-Greedy Algorithm for Multi-armed Bandits with Side Information , 2007, NIPS.
[6] Wei Chu,et al. Contextual Bandits with Linear Payoff Functions , 2011, AISTATS.
[7] M. Panagopoulou,et al. Are the Origins of Precision Medicine Found in the Corpus Hippocraticum? , 2017, Molecular Diagnosis & Therapy.
[8] Shipra Agrawal,et al. Further Optimal Regret Bounds for Thompson Sampling , 2012, AISTATS.
[9] Shipra Agrawal,et al. Analysis of Thompson Sampling for the Multi-armed Bandit Problem , 2011, COLT.
[10] Aaron Roth,et al. Fairness in Learning: Classic and Contextual Bandits , 2016, NIPS.
[11] Xintao Wu,et al. Achieving User-Side Fairness in Contextual Bandits , 2020, Human-Centric Intelligent Systems.
[12] Csaba Szepesvári,et al. Improved Algorithms for Linear Stochastic Bandits , 2011, NIPS.
[13] Haipeng Luo,et al. A New Algorithm for Non-stationary Contextual Bandits: Efficient, Optimal, and Parameter-free , 2019, COLT.
[14] Thorsten Joachims,et al. Fairness of Exposure in Stochastic Bandits , 2021, ICML.
[15] Nenghai Yu,et al. Thompson Sampling for Budgeted Multi-Armed Bandits , 2015, IJCAI.
[16] Peter Auer,et al. Using Confidence Bounds for Exploitation-Exploration Trade-offs , 2003, J. Mach. Learn. Res..
[17] Alicia R. Martin,et al. Clinical use of current polygenic risk scores may exacerbate health disparities , 2019, Nature Genetics.
[18] Archie C. Chapman,et al. Epsilon-First Policies for Budget-Limited Multi-Armed Bandits , 2010, AAAI.
[19] Elias Bareinboim,et al. Bandits with Unobserved Confounders: A Causal Approach , 2015, NIPS.
[20] Claudio Gentile,et al. A Gang of Bandits , 2013, NIPS.
[21] R. Srikant,et al. Algorithms with Logarithmic or Sublinear Regret for Constrained Contextual Bandits , 2015, NIPS.
[22] Zhuoran Yang,et al. Is Pessimism Provably Efficient for Offline RL? , 2020, ICML.
[23] Matt J. Kusner,et al. Counterfactual Fairness , 2017, NIPS.
[24] Ashish Kapoor,et al. Safety-Aware Algorithms for Adversarial Contextual Bandit , 2017, ICML.
[25] Ürün Dogan,et al. Multi-Task Learning for Contextual Bandits , 2017, NIPS.
[26] Olivier Nicol,et al. Improving offline evaluation of contextual bandit algorithms via bootstrapping techniques , 2014, ICML.
[27] Claire J. Tomlin,et al. Budget-Constrained Multi-Armed Bandits with Multiple Plays , 2017, AAAI.
[28] Tor Lattimore,et al. Optimally Confident UCB : Improved Regret for Finite-Armed Bandits , 2015, ArXiv.
[29] S. Murphy,et al. A "SMART" design for building individualized treatment sequences. , 2012, Annual review of clinical psychology.
[30] J. Paulus,et al. Predictably unequal: understanding and addressing concerns that algorithmic clinical prediction may increase health disparities , 2020, npj Digital Medicine.
[31] Ambuj Tewari,et al. Causal Bandits with Unknown Graph Structure , 2021, NeurIPS.
[32] Martin J. Wainwright,et al. Minimax Off-Policy Evaluation for Multi-Armed Bandits , 2021, IEEE Transactions on Information Theory.
[33] Shuai Li,et al. Collaborative Filtering Bandits , 2015, SIGIR.
[34] Christopher Jung,et al. Online Learning with an Unknown Fairness Metric , 2018, NeurIPS.
[35] Joaquin Quiñonero Candela,et al. Counterfactual reasoning and learning systems: the example of computational advertising , 2013, J. Mach. Learn. Res..
[36] Miroslav Dudík,et al. Optimal and Adaptive Off-policy Evaluation in Contextual Bandits , 2016, ICML.
[37] Assaf J. Zeevi,et al. A Note on Performance Limitations in Bandit Problems With Side Information , 2011, IEEE Transactions on Information Theory.
[38] Nicolò Cesa-Bianchi,et al. Combinatorial Bandits , 2012, COLT.
[39] Nikhil R. Devanur,et al. Bandits with concave rewards and convex knapsacks , 2014, EC.
[40] Yang Liu,et al. Calibrated Fairness in Bandits , 2017, ArXiv.
[41] Shuai Li,et al. Online Clustering of Bandits , 2014, ICML.
[42] John Langford,et al. Doubly Robust Policy Evaluation and Optimization , 2014, ArXiv.
[43] John N. Tsitsiklis,et al. Linearly Parameterized Bandits , 2008, Math. Oper. Res..
[44] Alexander D'Amour,et al. A Biologically Plausible Benchmark for Contextual Bandit Algorithms in Precision Oncology Using in vitro Data , 2019, ArXiv.
[45] Sanjeev R. Kulkarni,et al. Arbitrary side observations in bandit problems , 2005, Adv. Appl. Math..
[46] Shie Mannor,et al. Action Elimination and Stopping Conditions for the Multi-Armed Bandit and Reinforcement Learning Problems , 2006, J. Mach. Learn. Res..
[47] Gergely Neu,et al. Explore no more: Improved high-probability regret bounds for non-stochastic bandits , 2015, NIPS.
[48] Yifan Wu,et al. Conservative Bandits , 2016, ICML.
[49] Tor Lattimore,et al. Causal Bandits: Learning Good Interventions via Causal Inference , 2016, NIPS.
[50] Mehdi Boukhechba,et al. Offline Contextual Multi-armed Bandits for Mobile Health Interventions: A Case Study on Emotion Regulation , 2020, RecSys.
[51] Adel Javanmard,et al. Confidence intervals and hypothesis testing for high-dimensional regression , 2013, J. Mach. Learn. Res..
[52] Susan A. Murphy,et al. Statistical Inference with M-Estimators on Bandit Data , 2021, ArXiv.
[53] Peter Auer,et al. The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..
[54] Mi Zhang,et al. MyBehavior: automatic personalized health feedback from user behaviors and preferences using smartphones , 2015, UbiComp.
[55] Yuhong Yang,et al. RANDOMIZED ALLOCATION WITH NONPARAMETRIC ESTIMATION FOR A MULTI-ARMED BANDIT PROBLEM WITH COVARIATES , 2002 .
[56] Vineet Nair,et al. Budgeted and Non-budgeted Causal Bandits , 2020, AISTATS.
[57] Tao Qin,et al. Multi-Armed Bandit with Budget Constraint and Variable Costs , 2013, AAAI.
[58] Anupam Gupta,et al. Better Algorithms for Stochastic Bandits with Adversarial Corruptions , 2019, COLT.
[59] Haipeng Luo,et al. Non-stationary Reinforcement Learning without Prior Knowledge: An Optimal Black-box Approach , 2021, COLT.
[60] Purushottam Kar,et al. Corruption-tolerant bandit learning , 2018, Machine Learning.
[61] Xiaokui Xiao,et al. MOTS: Minimax Optimal Thompson Sampling , 2020, ArXiv.
[62] Lihong Li,et al. Toward Minimax Off-policy Value Estimation , 2015, AISTATS.
[63] Brian W. Powers,et al. Dissecting racial bias in an algorithm used to manage the health of populations , 2019, Science.
[64] Renato Paes Leme,et al. Stochastic bandits robust to adversarial corruptions , 2018, STOC.
[65] Thorsten Joachims,et al. Fairness of Exposure in Rankings , 2018, KDD.
[66] Ran Gilad-Bachrach,et al. PopTherapy: coping with stress through pop-culture , 2014, PervasiveHealth.
[67] Karen B. Farris,et al. The Potential Impact of Intelligent Systems for Mobile Health Self-Management Support: Monte Carlo Simulations of Text Message Support for Medication Adherence , 2014, Annals of behavioral medicine : a publication of the Society of Behavioral Medicine.
[68] A. Zeevi,et al. A Linear Response Bandit Problem , 2013 .
[69] Mohsen Bayati,et al. Online Decision-Making with High-Dimensional Covariates , 2015 .
[70] Sarah L Krein,et al. Patient-Centered Pain Care Using Artificial Intelligence and Mobile Health Tools: Protocol for a Randomized Study Funded by the US Department of Veterans Affairs Health Services Research and Development Program , 2016, JMIR research protocols.
[71] Ambuj Tewari,et al. Low-Rank Generalized Linear Bandit Problems , 2020, AISTATS.
[72] Tor Lattimore,et al. Refining the Confidence Level for Optimistic Bandit Strategies , 2018, J. Mach. Learn. Res..
[73] Wonyoung Kim,et al. Doubly Robust Thompson Sampling for linear payoffs , 2021, ArXiv.
[74] Ambuj Tewari,et al. From Ads to Interventions: Contextual Bandits in Mobile Health , 2017, Mobile Health - Sensors, Analytic Methods, and Applications.
[75] Yu Zhang,et al. A Survey on Multi-Task Learning , 2017, IEEE Transactions on Knowledge and Data Engineering.
[76] Peter Auer,et al. Adaptively Tracking the Best Bandit Arm with an Unknown Number of Distribution Changes , 2019, COLT.
[77] Michael Matthews,et al. The Alignment Problem: Machine Learning and Human Values , 2022, Personnel Psychology.
[78] Jean-Yves Audibert,et al. Minimax Policies for Adversarial and Stochastic Bandits. , 2009, COLT 2009.
[79] Elias Bareinboim,et al. Structural Causal Bandits with Non-Manipulable Variables , 2019, AAAI.
[80] Wei Chu,et al. A contextual-bandit approach to personalized news article recommendation , 2010, WWW '10.
[81] Zheng Wen,et al. Nearly Optimal Adaptive Procedure with Change Detection for Piecewise-Stationary Bandit , 2018, AISTATS.
[82] Eric Moulines,et al. On Upper-Confidence Bound Policies for Switching Bandit Problems , 2011, ALT.
[83] Csaba Szepesvári,et al. Online-to-Confidence-Set Conversions and Application to Sparse Stochastic Bandits , 2012, AISTATS.
[84] Nan Jiang,et al. Doubly Robust Off-policy Value Evaluation for Reinforcement Learning , 2015, ICML.
[85] Csaba Szepesvari,et al. Bandit Algorithms , 2020 .
[86] W. R. Thompson. ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .
[87] Alexandros G. Dimakis,et al. Identifying Best Interventions through Online Importance Sampling , 2017, ICML.
[88] Alessandro Lazaric,et al. Conservative Exploration in Reinforcement Learning , 2020, AISTATS.
[89] Yasin Abbasi-Yadkori,et al. The Elliptical Potential Lemma Revisited , 2020, ArXiv.
[90] Nicholas Mattei,et al. Group Fairness in Bandit Arm Selection , 2019, ArXiv.
[91] David Haussler,et al. The Probably Approximately Correct (PAC) and Other Learning Models , 1993 .
[92] Nicole Immorlica,et al. Adversarial Bandits with Knapsacks , 2018, 2019 IEEE 60th Annual Symposium on Foundations of Computer Science (FOCS).
[93] Predrag Klasnja,et al. IntelligentPooling: practical Thompson sampling for mHealth , 2021, Mach. Learn..
[94] Wei Chu,et al. Unbiased offline evaluation of contextual-bandit-based news article recommendation algorithms , 2010, WSDM '11.
[95] Benjamin Van Roy,et al. Conservative Contextual Linear Bandits , 2016, NIPS.
[96] Rémi Munos,et al. Thompson Sampling: An Asymptotically Optimal Finite-Time Analysis , 2012, ALT.
[97] Ambuj Tewari,et al. Optimizing mHealth Interventions with a Bandit , 2019, Studies in Neuroscience, Psychology and Behavioral Economics.
[98] Tor Lattimore,et al. High-Dimensional Sparse Linear Bandits , 2020, NeurIPS.
[99] Xingzhi Sun,et al. Reinforcement Learning for Clinical Decision Support in Critical Care: Comprehensive Review , 2020, Journal of medical Internet research.
[100] M. Clayton. Covariate models for bernoulli bandits , 1989 .
[101] J. Sarkar. One-Armed Bandit Problems with Covariates , 1991 .
[102] Elias Bareinboim,et al. Structural Causal Bandits: Where to Intervene? , 2018, NeurIPS.
[103] Kristjan H. Greenewald,et al. Personalized HeartSteps , 2019, Proc. ACM Interact. Mob. Wearable Ubiquitous Technol..
[104] Sébastien Gerchinovitz,et al. Sparsity Regret Bounds for Individual Sequences in Online Linear Regression , 2011, COLT.
[105] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[106] Cun-Hui Zhang,et al. Adaptive Lasso for sparse high-dimensional regression models , 2008 .
[107] Peter Auer,et al. UCB revisited: Improved regret bounds for the stochastic multi-armed bandit problem , 2010, Period. Math. Hung..
[108] Rémi Munos,et al. Pure Exploration in Multi-armed Bandits Problems , 2009, ALT.
[109] Moshe Tennenholtz,et al. Encouraging Physical Activity in Patients With Diabetes: Intervention Using a Reinforcement Learning System , 2017, Journal of medical Internet research.
[110] Aleksandrs Slivkins,et al. Bandits with Knapsacks , 2013, 2013 IEEE 54th Annual Symposium on Foundations of Computer Science.
[111] Massimiliano Pontil,et al. The Benefit of Multitask Representation Learning , 2015, J. Mach. Learn. Res..
[112] Archie C. Chapman,et al. Knapsack Based Optimal Policies for Budget-Limited Multi-Armed Bandits , 2012, AAAI.
[113] Omar Besbes,et al. Stochastic Multi-Armed-Bandit Problem with Non-stationary Rewards , 2014, NIPS.
[114] Thomas P. Hayes,et al. Stochastic Linear Optimization under Bandit Feedback , 2008, COLT.
[115] Nikhil R. Devanur,et al. Linear Contextual Bandits with Knapsacks , 2015, NIPS.
[116] J. Robins,et al. Doubly Robust Estimation in Missing Data and Causal Inference Models , 2005, Biometrics.
[117] Ambuj Tewari,et al. Microrandomized trials: An experimental design for developing just-in-time adaptive interventions. , 2015, Health psychology : official journal of the Division of Health Psychology, American Psychological Association.
[118] T. L. Lai Andherbertrobbins. Asymptotically Efficient Adaptive Allocation Rules , 2022 .
[119] Santiago Ontañón,et al. Can the artificial intelligence technique of reinforcement learning use continuously-monitored digital data to optimize treatment for weight loss? , 2018, Journal of Behavioral Medicine.
[120] Ambuj Tewari,et al. Regret Analysis of Bandit Problems with Causal Background Knowledge , 2019, UAI.
[121] H. Mamani,et al. How Do Tumor Cytogenetics Inform Cancer Treatments? Dynamic Risk Stratification and Precision Medicine Using Multi-armed Bandits , 2019, SSRN Electronic Journal.
[122] Sonia Jain,et al. A Bayesian‐bandit adaptive design for N‐of‐1 clinical trials , 2021, Statistics in medicine.