Improving multi-armed bandit algorithms in online pricing settings
暂无分享,去创建一个
Marcello Restelli | Nicola Gatti | Francesco Trovò | Stefano Paladino | Marcello Restelli | N. Gatti | F. Trovò | Stefano Paladino
[1] Csaba Szepesvári,et al. Exploration-exploitation tradeoff using variance estimates in multi-armed bandits , 2009, Theor. Comput. Sci..
[2] Csaba Szepesvári,et al. An adaptive algorithm for finite stochastic partial monitoring , 2012, ICML.
[3] Christian Schindelhauer,et al. Discrete Prediction Games with Arbitrary Feedback and Loss , 2001, COLT/EuroCOLT.
[4] Vijay Kumar,et al. Online learning in online auctions , 2003, SODA '03.
[5] Richard Cole,et al. The sample complexity of revenue maximization , 2014, STOC.
[6] Peter Auer,et al. The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..
[7] Csaba Szepesvári,et al. Partial Monitoring - Classification, Regret Bounds, and Algorithms , 2014, Math. Oper. Res..
[8] Shie Mannor,et al. Unimodal Bandits , 2011, ICML.
[9] Csaba Szepesvári,et al. Minimax Regret of Finite Partial-Monitoring Games in Stochastic Environments , 2011, COLT.
[10] Dean P. Foster,et al. No Internal Regret via Neighborhood Watch , 2011, AISTATS.
[11] H. Chernoff. A Note on an Inequality Involving the Normal Distribution , 1981 .
[12] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.
[13] Noga Alon,et al. From Bandits to Experts: A Tale of Domination and Independence , 2013, NIPS.
[14] Alessandro Lazaric,et al. A truthful learning mechanism for contextual multi-slot sponsored search auctions with externalities , 2012, EC '12.
[15] W. Hoeffding. Probability Inequalities for sums of Bounded Random Variables , 1963 .
[16] Nicolò Cesa-Bianchi,et al. Regret Minimization Under Partial Monitoring , 2006, 2006 IEEE Information Theory Workshop - ITW '06 Punta del Este.
[17] Fan Chung Graham,et al. Concentration Inequalities and Martingale Inequalities: A Survey , 2006, Internet Math..
[18] Assaf J. Zeevi,et al. Dynamic Pricing with an Unknown Demand Model: Asymptotically Optimal Semi-Myopic Policies , 2014, Oper. Res..
[19] H. Robbins,et al. Asymptotically efficient adaptive allocation rules , 1985 .
[20] Rémi Munos,et al. Thompson Sampling: An Asymptotically Optimal Finite-Time Analysis , 2012, ALT.
[21] Jialin Liu,et al. Differential Evolution algorithm applied to non-stationary bandit problem , 2014, 2014 IEEE Congress on Evolutionary Computation (CEC).
[22] Shie Mannor,et al. From Bandits to Experts: On the Value of Side-Observations , 2011, NIPS.
[23] Gábor Lugosi,et al. Prediction, learning, and games , 2006 .
[24] Aurélien Garivier,et al. On Upper-Confidence Bound Policies for Non-Stationary Bandit Problems , 2008, 0805.3415.
[25] Peter S. Fader,et al. Dynamic Conversion Behavior at E-Commerce Sites , 2004, Manag. Sci..
[26] Eric Moulines,et al. On Upper-Confidence Bound Policies for Switching Bandit Problems , 2011, ALT.
[27] Josef Broder,et al. Dynamic Pricing Under a General Parametric Choice Model , 2012, Oper. Res..
[28] Alexandre Proutière,et al. Unimodal Bandits: Regret Lower Bounds and Optimal Algorithms , 2014, ICML.
[29] Omar Besbes,et al. Dynamic Pricing Without Knowing the Demand Function: Risk Bounds and Near-Optimal Algorithms , 2009, Oper. Res..
[30] Sanmay Das,et al. Learning the demand curve in posted-price digital goods auctions , 2011, AAMAS.
[31] Frank Thomson Leighton,et al. The value of knowing a demand curve: bounds on regret for online posted-price auctions , 2003, 44th Annual IEEE Symposium on Foundations of Computer Science, 2003. Proceedings..
[32] Tim Roughgarden,et al. On the Pseudo-Dimension of Nearly Optimal Auctions , 2015, NIPS.
[33] Omar Besbes,et al. On the (Surprising) Sufficiency of Linear Models for Dynamic Pricing with Demand Learning , 2014, Manag. Sci..
[34] Hsing Kenneth Cheng,et al. Free Trial or No Free Trial: Optimal Software Product Design with Network Effects , 2010, Eur. J. Oper. Res..