Sequential Decision Making with Combinatorial Actions and High-Dimensional Contexts
暂无分享,去创建一个
[1] Milton Abramowitz,et al. Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables , 1964 .
[2] Cun-Hui Zhang. Nearly unbiased variable selection under minimax concave penalty , 2010, 1002.4734.
[3] R. Plackett. The Analysis of Permutations , 1975 .
[4] E. L. Lehmann,et al. Theory of point estimation , 1950 .
[5] David B. Shmoys,et al. Dynamic Assortment Optimization with a Multinomial Logit Choice Model and Capacity Constraint , 2010, Oper. Res..
[6] Martin J. Wainwright,et al. High-Dimensional Statistics , 2019 .
[7] Xue Wang,et al. Minimax Concave Penalized Multi-Armed Bandit Model with High-Dimensional Convariates , 2018, ICML.
[8] Peter Auer,et al. Near-optimal Regret Bounds for Reinforcement Learning , 2008, J. Mach. Learn. Res..
[9] Wei Cao,et al. On Top-k Selection in Multi-Armed Bandits and Hidden Bipartite Graphs , 2015, NIPS.
[10] David Simchi-Levi,et al. Assortment Optimization under Unknown MultiNomial Logit Choice Models , 2017, ArXiv.
[11] Alessandro Lazaric,et al. Linear Thompson Sampling Revisited , 2016, AISTATS.
[12] Elad Hazan,et al. Logarithmic regret algorithms for online convex optimization , 2006, Machine Learning.
[13] P. Bickel,et al. SIMULTANEOUS ANALYSIS OF LASSO AND DANTZIG SELECTOR , 2008, 0801.1095.
[14] Vashist Avadhanula,et al. Thompson Sampling for the MNL-Bandit , 2017, COLT.
[15] Lihong Li,et al. An Empirical Evaluation of Thompson Sampling , 2011, NIPS.
[16] Daniel McFadden,et al. Modelling the Choice of Residential Location , 1977 .
[17] Jasper Snoek,et al. Deep Bayesian Bandits Showdown: An Empirical Comparison of Bayesian Deep Networks for Thompson Sampling , 2018, ICLR.
[18] Xi Chen,et al. A Note on Tight Lower Bound for MNL-Bandit Assortment Selection Models , 2017, ArXiv.
[19] Nando de Freitas,et al. An Introduction to MCMC for Machine Learning , 2004, Machine Learning.
[20] Lihong Li,et al. Provable Optimal Algorithms for Generalized Linear Contextual Bandits , 2017, ArXiv.
[21] Gi-Soo Kim,et al. Doubly-Robust Lasso Bandit , 2019, NeurIPS.
[22] Rong Jin,et al. Multinomial Logit Bandit with Linear Utility Functions , 2018, IJCAI.
[23] Fernando Bernstein,et al. A Dynamic Clustering Approach to Data-Driven Assortment Personalization , 2018, Manag. Sci..
[24] Assaf J. Zeevi,et al. Optimal Dynamic Assortment Planning with Demand Learning , 2013, Manuf. Serv. Oper. Manag..
[25] Xiaoyan Zhu,et al. Contextual Combinatorial Bandit and its Application on Diversified Online Recommendation , 2014, SDM.
[26] Thomas P. Hayes,et al. Stochastic Linear Optimization under Bandit Feedback , 2008, COLT.
[27] Mohsen Bayati,et al. Online Decision-Making with High-Dimensional Covariates , 2015 .
[28] Shuai Li,et al. Contextual Combinatorial Cascading Bandits , 2016, ICML.
[29] David Simchi-Levi,et al. Thompson Sampling for Online Personalized Assortment Optimization Problems with Multinomial Logit Choice Models , 2017 .
[30] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.
[31] Madeleine Udell,et al. Dynamic Assortment Personalization in High Dimensions , 2016, Oper. Res..
[32] Zheng Wen,et al. Cascading Bandits: Learning to Rank in the Cascade Model , 2015, ICML.
[33] Csaba Szepesvári,et al. Improved Algorithms for Linear Stochastic Bandits , 2011, NIPS.
[34] Beibei Li,et al. Examining the Impact of Ranking on Consumer Behavior and Search Engine Revenue , 2013, Manag. Sci..
[35] Benjamin Van Roy,et al. A Tutorial on Thompson Sampling , 2017, Found. Trends Mach. Learn..
[36] Adel Javanmard,et al. Dynamic Pricing in High-Dimensions , 2016, J. Mach. Learn. Res..
[37] Branislav Kveton,et al. Efficient Learning in Large-Scale Combinatorial Semi-Bandits , 2014, ICML.
[38] Danny Segev,et al. Greedy-Like Algorithms for Dynamic Assortment Planning Under Multinomial Logit Preferences , 2015, Oper. Res..
[39] Renyuan Xu,et al. Learning in Generalized Linear Contextual Bandits with Stochastic Delays , 2019, NeurIPS.
[40] Craig Boutilier,et al. Randomized Exploration in Generalized Linear Bandits , 2019, AISTATS.
[41] Huseyin Topaloglu,et al. Assortment Optimization Under Variants of the Nested Logit Model , 2014, Oper. Res..
[42] R. Tibshirani. Regression Shrinkage and Selection via the Lasso , 1996 .
[43] Xi Chen,et al. Dynamic Assortment Optimization with Changing Contextual Information , 2018, J. Mach. Learn. Res..
[44] Martin J. Wainwright,et al. Restricted Eigenvalue Properties for Correlated Gaussian Designs , 2010, J. Mach. Learn. Res..
[45] Shipra Agrawal,et al. Analysis of Thompson Sampling for the Multi-armed Bandit Problem , 2011, COLT.
[46] Vineet Goyal,et al. Near-Optimal Algorithms for Capacity Constrained Assortment Optimization , 2014 .
[47] Zheng Wen,et al. Cascading Bandits for Large-Scale Recommendation Problems , 2016, UAI.
[48] Benjamin Van Roy,et al. Learning to Optimize via Posterior Sampling , 2013, Math. Oper. Res..
[49] G. Gallego,et al. Assortment Planning Under the Multinomial Logit Model with Totally Unimodular Constraint Structures , 2013 .
[50] Elad Hazan,et al. Logistic Regression: Tight Bounds for Stochastic and Online Optimization , 2014, COLT.
[51] D. Pollard. Empirical Processes: Theory and Applications , 1990 .
[52] Rebecca Willett,et al. Sparse linear contextual bandits via relevance vector machines , 2017, 2017 International Conference on Sampling Theory and Applications (SampTA).
[53] Aurélien Garivier,et al. Parametric Bandits: The Generalized Linear Case , 2010, NIPS.
[54] Sébastien Bubeck,et al. Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems , 2012, Found. Trends Mach. Learn..
[55] R. Luce,et al. Individual Choice Behavior: A Theoretical Analysis. , 1960 .
[56] Rémi Munos,et al. Bandit Theory meets Compressed Sensing for high dimensional Stochastic Linear Bandit , 2012, AISTATS.
[57] T. L. Lai Andherbertrobbins. Asymptotically Efficient Adaptive Allocation Rules , 2022 .
[58] Sara van de Geer,et al. Statistics for High-Dimensional Data: Methods, Theory and Applications , 2011 .
[59] John Langford,et al. Taming the Monster: A Fast and Simple Algorithm for Contextual Bandits , 2014, ICML.
[60] Philip M. Long,et al. Associative Reinforcement Learning using Linear Probabilistic Concepts , 1999, ICML.
[61] Malcolm J. A. Strens,et al. A Bayesian Framework for Reinforcement Learning , 2000, ICML.
[62] Ambuj Tewari,et al. From Ads to Interventions: Contextual Bandits in Mobile Health , 2017, Mobile Health - Sensors, Analytic Methods, and Applications.
[63] R. Duncan Luce,et al. Individual Choice Behavior: A Theoretical Analysis , 1979 .
[64] J. Robins,et al. Doubly Robust Estimation in Missing Data and Causal Inference Models , 2005, Biometrics.
[65] Joel A. Tropp,et al. User-Friendly Tail Bounds for Sums of Random Matrices , 2010, Found. Comput. Math..
[66] John Langford,et al. The Epoch-Greedy Algorithm for Multi-armed Bandits with Side Information , 2007, NIPS.
[67] Garrett J. van Ryzin,et al. Revenue Management Under a General Discrete Choice Model of Consumer Behavior , 2004, Manag. Sci..
[68] A. Zeevi,et al. A Linear Response Bandit Problem , 2013 .
[69] Terence Tao,et al. The Dantzig selector: Statistical estimation when P is much larger than n , 2005, math/0506081.
[70] Peter Auer,et al. Using Confidence Bounds for Exploitation-Exploration Trade-offs , 2003, J. Mach. Learn. Res..
[71] John N. Tsitsiklis,et al. Linearly Parameterized Bandits , 2008, Math. Oper. Res..
[72] Wei Chu,et al. Contextual Bandits with Linear Payoff Functions , 2011, AISTATS.
[73] Csaba Szepesvári,et al. Online-to-Confidence-Set Conversions and Application to Sparse Stochastic Bandits , 2012, AISTATS.
[74] Kani Chen,et al. Strong consistency of maximum quasi-likelihood estimators in generalized linear models with fixed and adaptive designs , 1999 .
[75] G. Simons,et al. On the theory of elliptically contoured distributions , 1981 .
[76] W. R. Thompson. ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .
[77] P. Bartlett,et al. Local Rademacher complexities , 2005, math/0508275.
[78] S. Geer. HIGH-DIMENSIONAL GENERALIZED LINEAR MODELS AND THE LASSO , 2008, 0804.0703.
[79] Felipe Caro,et al. Dynamic Assortment with Demand Learning for Seasonal Consumer Goods , 2007, Manag. Sci..
[80] Rémi Munos,et al. Thompson Sampling: An Asymptotically Optimal Finite-Time Analysis , 2012, ALT.
[81] Khashayar Khosravi,et al. Mostly Exploration-Free Algorithms for Contextual Bandits , 2017, Manag. Sci..
[82] Zhi-Hua Zhou,et al. Online Stochastic Linear Optimization under One-bit Feedback , 2015, ICML.
[83] S. Geer,et al. On the conditions used to prove oracle results for the Lasso , 2009, 0910.0722.
[84] Wei Chu,et al. A contextual-bandit approach to personalized news article recommendation , 2010, WWW '10.