Inventory Balancing with Online Learning

We study a general problem of allocating limited resources to heterogeneous customers over time, under model uncertainty. Each type of customer can be serviced using different actions, each of which stochastically consumes some combination of resources, and returns different rewards for the resources consumed. We consider a general model framework, where the resource consumption distribution associated with each (customer type, action) combination is not known, but is consistent and can be learned over time. In addition, the sequence of customer types to arrive over time is arbitrary and completely unknown. We achieve near optimality under both model uncertainty and customer heterogeneity by judiciously synergizing two algorithmic frameworks in the literature: inventory balancing, which "reserves" a portion of each resource for high-reward customer types which could later arrive; and online learning, which shows how to "explore'' the resource consumption distributions of each customer type under different actions. We define an auxiliary problem, which allows for existing competitive ratio and regret bounds to be seamlessly integrated. Furthermore, we show that the performance guarantee generated by our framework is tight, using the special case of the online bipartite matching problem with unknown match probabilities. Finally, we demonstrate the practicality and efficacy of algorithms generated by our framework using a publicly available hotel data set.

[1]  Bala Kalyanasundaram,et al.  An Optimal Deterministic Algorithm for Online b-Matching , 1996, FSTTCS.

[2]  Richard M. Karp,et al.  An optimal algorithm for on-line bipartite matching , 1990, STOC '90.

[3]  Aranyak Mehta,et al.  AdWords and Generalized On-line Matching , 2005, FOCS.

[4]  Aravind Srinivasan,et al.  New Algorithms, Better Bounds, and a Novel Model for Online Stochastic Matching , 2016, ESA.

[5]  K. Talluri,et al.  An Analysis of Bid-Price Controls for Network Revenue Management , 1998 .

[6]  Sébastien Bubeck,et al.  Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems , 2012, Found. Trends Mach. Learn..

[7]  Van-Anh Truong,et al.  Online Advance Admission Scheduling for Services with Customer Preferences. , 2018, 1805.10412.

[8]  Omar Besbes,et al.  Blind Network Revenue Management , 2011, Oper. Res..

[9]  Aranyak Mehta,et al.  Online Stochastic Matching: Beating 1-1/e , 2009, 2009 50th Annual IEEE Symposium on Foundations of Computer Science.

[10]  Van-Anh Truong,et al.  Optimal Advance Scheduling , 2015, Manag. Sci..

[11]  D. Simchi-Levi,et al.  Online Network Revenue Management Using Thompson Sampling , 2017 .

[12]  Omar Besbes,et al.  Dynamic Pricing Without Knowing the Demand Function: Risk Bounds and Near-Optimal Algorithms , 2009, Oper. Res..

[13]  Maurice Queyranne,et al.  Toward Robust Revenue Management: Competitive Analysis of Online Booking , 2009, Oper. Res..

[14]  Aranyak Mehta,et al.  Online Matching with Stochastic Rewards , 2012, 2012 IEEE 53rd Annual Symposium on Foundations of Computer Science.

[15]  Chung-Piaw Teo,et al.  Appointment Scheduling under Patient Schedule-Dependent No-Show Behavior , 2018 .

[16]  Aleksandrs Slivkins,et al.  Bandits with Knapsacks , 2013, 2013 IEEE 54th Annual Symposium on Foundations of Computer Science.

[17]  Nan Liu,et al.  Appointment Scheduling Under Patient Preference and No-Show Behavior , 2014, Oper. Res..

[18]  Nikhil R. Devanur,et al.  Randomized Primal-Dual analysis of RANKING for Online BiPartite Matching , 2013, SODA.

[19]  Mohammad Taghi Hajiaghayi,et al.  Online prophet-inequality matching with applications to ad allocation , 2012, EC '12.

[20]  Allan Borodin,et al.  Online computation and competitive analysis , 1998 .

[21]  Mark E. Ferguson,et al.  Data Set - Choice-Based Revenue Management: Data from a Major Hotel Chain , 2009, Manuf. Serv. Oper. Manag..

[22]  Nikhil R. Devanur,et al.  An efficient algorithm for contextual bandits with knapsacks, and an extension to concave objectives , 2015, COLT.

[23]  Zizhuo Wang,et al.  Close the Gaps: A Learning-While-Doing Algorithm for Single-Product Revenue Management Problems , 2014, Oper. Res..

[24]  Joseph Naor,et al.  Online Primal-Dual Algorithms for Maximizing Ad-Auctions Revenue , 2007, ESA.

[25]  David Simchi-Levi,et al.  Thompson Sampling for Online Personalized Assortment Optimization Problems with Multinomial Logit Choice Models , 2017 .

[26]  Hamid Nazerzadeh,et al.  Real-time optimization of personalized assortments , 2013, EC '13.

[27]  Eli Upfal,et al.  Multi-Armed Bandits in Metric Spaces ∗ , 2008 .

[28]  Nikhil R. Devanur,et al.  Bandits with concave rewards and convex knapsacks , 2014, EC.

[29]  John Langford,et al.  Resourceful Contextual Bandits , 2014, COLT.

[30]  Nikhil R. Devanur,et al.  Online matching with concave returns , 2012, STOC '12.

[31]  Hamid Nazerzadeh,et al.  Real-time optimization of personalized assortments , 2013, EC.

[32]  David Simchi-Levi,et al.  Online Network Revenue Management Using Thompson Sampling , 2017, Oper. Res..

[33]  Moshe Babaioff,et al.  Dynamic Pricing with Limited Supply , 2011, ACM Trans. Economics and Comput..

[34]  Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, December 5-10, 2016, Barcelona, Spain , 2016, NIPS.

[35]  Debmalya Panigrahi,et al.  Online Budgeted Allocation with General Budgets , 2016, EC.

[36]  Aleksandrs Slivkins,et al.  Introduction to Multi-Armed Bandits , 2019, Found. Trends Mach. Learn..