暂无分享,去创建一个
[1] Felipe Caro,et al. Dynamic Assortment with Demand Learning for Seasonal Consumer Goods , 2007, Manag. Sci..
[2] Vashist Avadhanula,et al. Thompson Sampling for the MNL-Bandit , 2017, COLT.
[3] Wei Chu,et al. Contextual Bandits with Linear Payoff Functions , 2011, AISTATS.
[4] Philip M. Long,et al. Associative Reinforcement Learning using Linear Probabilistic Concepts , 1999, ICML.
[5] Elad Hazan,et al. Logarithmic regret algorithms for online convex optimization , 2006, Machine Learning.
[6] Zheng Wen,et al. Cascading Bandits: Learning to Rank in the Cascade Model , 2015, ICML.
[7] P. Bartlett,et al. Local Rademacher complexities , 2005, math/0508275.
[8] Xi Chen,et al. A Note on Tight Lower Bound for MNL-Bandit Assortment Selection Models , 2017, ArXiv.
[9] Beibei Li,et al. Examining the Impact of Ranking on Consumer Behavior and Search Engine Revenue , 2013, Manag. Sci..
[10] Xi Chen,et al. Dynamic Assortment Optimization with Changing Contextual Information , 2018, J. Mach. Learn. Res..
[11] Daniel McFadden,et al. Modelling the Choice of Residential Location , 1977 .
[12] Huseyin Topaloglu,et al. Assortment Optimization Under Variants of the Nested Logit Model , 2014, Oper. Res..
[13] Vashist Avadhanula,et al. A Near-Optimal Exploration-Exploitation Approach for Assortment Selection , 2016, EC.
[14] Craig Boutilier,et al. Randomized Exploration in Generalized Linear Bandits , 2019, AISTATS.
[15] Vashist Avadhanula,et al. MNL-Bandit: A Dynamic Learning Approach to Assortment Selection , 2017, Oper. Res..
[16] Csaba Szepesvari,et al. Bandit Algorithms , 2020 .
[17] E. L. Lehmann,et al. Theory of point estimation , 1950 .
[18] W. R. Thompson. ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .
[19] Adel Javanmard,et al. Dynamic Pricing in High-Dimensions , 2016, J. Mach. Learn. Res..
[20] Branislav Kveton,et al. Efficient Learning in Large-Scale Combinatorial Semi-Bandits , 2014, ICML.
[21] Thomas P. Hayes,et al. Stochastic Linear Optimization under Bandit Feedback , 2008, COLT.
[22] Rong Jin,et al. Multinomial Logit Bandit with Linear Utility Functions , 2018, IJCAI.
[23] Lihong Li,et al. Provable Optimal Algorithms for Generalized Linear Contextual Bandits , 2017, ArXiv.
[24] Aurélien Garivier,et al. Parametric Bandits: The Generalized Linear Case , 2010, NIPS.
[25] Wei Cao,et al. On Top-k Selection in Multi-Armed Bandits and Hidden Bipartite Graphs , 2015, NIPS.
[26] D. Pollard. Empirical Processes: Theory and Applications , 1990 .
[27] Danny Segev,et al. Greedy-Like Algorithms for Dynamic Assortment Planning Under Multinomial Logit Preferences , 2015, Oper. Res..
[28] Joel A. Tropp,et al. User-Friendly Tail Bounds for Sums of Random Matrices , 2010, Found. Comput. Math..
[29] Zhi-Hua Zhou,et al. Online Stochastic Linear Optimization under One-bit Feedback , 2015, ICML.
[30] John N. Tsitsiklis,et al. Linearly Parameterized Bandits , 2008, Math. Oper. Res..
[31] Zheng Wen,et al. Cascading Bandits for Large-Scale Recommendation Problems , 2016, UAI.
[32] David Simchi-Levi,et al. Thompson Sampling for Online Personalized Assortment Optimization Problems with Multinomial Logit Choice Models , 2017 .
[33] David B. Shmoys,et al. Dynamic Assortment Optimization with a Multinomial Logit Choice Model and Capacity Constraint , 2010, Oper. Res..
[34] G. Gallego,et al. Assortment Planning Under the Multinomial Logit Model with Totally Unimodular Constraint Structures , 2013 .
[35] Peter Auer,et al. Near-optimal Regret Bounds for Reinforcement Learning , 2008, J. Mach. Learn. Res..
[36] Assaf J. Zeevi,et al. Optimal Dynamic Assortment Planning with Demand Learning , 2013, Manuf. Serv. Oper. Manag..
[37] Xiaoyan Zhu,et al. Contextual Combinatorial Bandit and its Application on Diversified Online Recommendation , 2014, SDM.
[38] Min-hwan Oh,et al. Thompson Sampling for Multinomial Logit Contextual Bandits , 2019, NeurIPS.
[39] Csaba Szepesvári,et al. Improved Algorithms for Linear Stochastic Bandits , 2011, NIPS.
[40] Peter Auer,et al. Using Confidence Bounds for Exploitation-Exploration Trade-offs , 2003, J. Mach. Learn. Res..
[41] Elad Hazan,et al. Logistic Regression: Tight Bounds for Stochastic and Online Optimization , 2014, COLT.
[42] Renyuan Xu,et al. Learning in Generalized Linear Contextual Bandits with Stochastic Delays , 2019, NeurIPS.
[43] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.