暂无分享,去创建一个
[1] J. Tropp. User-Friendly Tail Bounds for Matrix Martingales , 2011 .
[2] H. Vincent Poor,et al. Bandit problems with side observations , 2005, IEEE Transactions on Automatic Control.
[3] Thomas P. Hayes,et al. Stochastic Linear Optimization under Bandit Feedback , 2008, COLT.
[4] W. R. Thompson. ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .
[5] Assaf J. Zeevi,et al. Chasing Demand: Learning and Earning in a Changing Environment , 2016, Math. Oper. Res..
[6] Vianney Perchet,et al. The multi-armed bandit problem with covariates , 2011, ArXiv.
[7] Yifan Wu,et al. Conservative Bandits , 2016, ICML.
[8] Renato Paes Leme,et al. Feature-based Dynamic Pricing , 2016, EC.
[9] Vianney Perchet,et al. Online learning in repeated auctions , 2015, COLT.
[10] A. Zeevi,et al. Woodroofe's One-Armed Bandit Problem Revisited , 2009, 0909.0119.
[11] Mohsen Bayati,et al. Online Decision-Making with High-Dimensional Covariates , 2015 .
[12] H. Robbins,et al. Asymptotically efficient adaptive allocation rules , 1985 .
[13] Assaf J. Zeevi,et al. On Incomplete Learning and Certainty-Equivalence Control , 2017, Oper. Res..
[14] Vivek F. Farias,et al. Optimistic Gittins Indices , 2016, NIPS.
[15] John Langford,et al. The Epoch-Greedy Algorithm for Multi-armed Bandits with Side Information , 2007, NIPS.
[16] E. L. Lehmann,et al. Theory of point estimation , 1950 .
[17] Josef Broder,et al. Dynamic Pricing Under a General Parametric Choice Model , 2012, Oper. Res..
[18] A. Zeevi,et al. A Linear Response Bandit Problem , 2013 .
[19] Martin J. Wainwright,et al. High-Dimensional Statistics , 2019 .
[20] John N. Tsitsiklis,et al. A Structured Multiarmed Bandit Problem and the Greedy Policy , 2008, IEEE Transactions on Automatic Control.
[21] J. Gittins. Bandit processes and dynamic allocation indices , 1979 .
[22] Benjamin Van Roy,et al. Conservative Contextual Linear Bandits , 2016, NIPS.
[23] Wei Chu,et al. Contextual Bandits with Linear Payoff Functions , 2011, AISTATS.
[24] Philippe Rigollet,et al. Nonparametric Bandits with Covariates , 2010, COLT.
[25] Stephen E. Chick,et al. Bayesian Sequential Learning for Clinical Trials of Multiple Correlated Medical Interventions , 2018, Manag. Sci..
[26] Adel Javanmard. Perishability of Data: Dynamic Pricing under Varying-Coefficient Models , 2017, J. Mach. Learn. Res..
[27] Mohsen Bayati,et al. Dynamic Pricing with Demand Covariates , 2016, 1604.07463.
[28] Adel Javanmard,et al. Dynamic Pricing in High-Dimensions , 2016, J. Mach. Learn. Res..
[29] Thorsten Gerber,et al. Handbook Of Mathematical Functions , 2016 .
[30] T. Lai,et al. Least Squares Estimates in Stochastic Regression Models with Applications to Identification and Control of Dynamic Systems , 1982 .
[31] Sébastien Bubeck,et al. Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems , 2012, Found. Trends Mach. Learn..
[32] Edward S. Kim,et al. The BATTLE trial: personalizing therapy for lung cancer. , 2011, Cancer discovery.
[33] Vashist Avadhanula,et al. MNL-Bandit: A Dynamic Learning Approach to Assortment Selection , 2017, Oper. Res..
[34] Carlos Riquelme,et al. Online Active Linear Regression via Thresholding , 2016, AAAI.
[35] Bert Zwart,et al. Simultaneously Learning and Optimizing Using Controlled Variance Pricing , 2014, Manag. Sci..
[36] Lihong Li,et al. Provable Optimal Algorithms for Generalized Linear Contextual Bandits , 2017, ArXiv.
[37] Tor Lattimore,et al. Bounded Regret for Finite-Armed Structured Bandits , 2014, NIPS.
[38] Nathan Kallus,et al. Recursive Partitioning for Personalization using Observational Data , 2016, ICML.
[39] Benjamin Van Roy,et al. Learning to Optimize via Posterior Sampling , 2013, Math. Oper. Res..
[40] Shipra Agrawal,et al. Thompson Sampling for Contextual Bandits with Linear Payoffs , 2012, ICML.
[41] Nathan Kallus,et al. Policy Evaluation and Optimization with Continuous Treatments , 2018, AISTATS.
[42] Kani Chen,et al. Strong consistency of maximum quasi-likelihood estimators in generalized linear models with fixed and adaptive designs , 1999 .
[43] M. Woodroofe. A One-Armed Bandit Problem with a Concomitant Variable , 1979 .
[44] Benjamin Van Roy,et al. Learning to Optimize via Information-Directed Sampling , 2014, NIPS.
[45] Sanjeev R. Kulkarni,et al. Arbitrary side observations in bandit problems , 2005, Adv. Appl. Math..
[46] Assaf J. Zeevi,et al. Dynamic Pricing with an Unknown Demand Model: Asymptotically Optimal Semi-Myopic Policies , 2014, Oper. Res..
[47] Aleksandrs Slivkins,et al. Contextual Bandits with Similarity Information , 2009, COLT.
[48] Arnoud V. den Boer. Tracking the market: Dynamic pricing and learning in a changing environment , 2015, Eur. J. Oper. Res..
[49] Bert Zwart,et al. Dynamic Pricing and Learning with Finite Inventories , 2013, Oper. Res..
[50] Ambuj Tewari,et al. From Ads to Interventions: Contextual Bandits in Mobile Health , 2017, Mobile Health - Sensors, Analytic Methods, and Applications.
[51] K. Narendra,et al. Persistent excitation in adaptive systems , 1987 .
[52] Peter Auer,et al. Using Confidence Bounds for Exploitation-Exploration Trade-offs , 2003, J. Mach. Learn. Res..
[53] J. Sarkar. One-Armed Bandit Problems with Covariates , 1991 .
[54] A. Tsybakov,et al. Optimal aggregation of classifiers in statistical learning , 2003 .
[55] Csaba Szepesvari,et al. Online learning for linearly parametrized control problems , 2012 .
[56] Csaba Szepesvári,et al. Improved Algorithms for Linear Stochastic Bandits , 2011, NIPS.
[57] Nhan T. Nguyen. Model-Reference Adaptive Control , 2018 .
[58] R. Altman,et al. Estimation of the warfarin dose with clinical and pharmacogenetic data. , 2009, The New England journal of medicine.
[59] Wei Chu,et al. A contextual-bandit approach to personalized news article recommendation , 2010, WWW '10.