Two Phase $Q-$learning for Bidding-based Vehicle Sharing

We consider one-way vehicle sharing systems where customers can rent a car at one station and drop it off at another. The problem we address is to optimize the distribution of cars, and quality of service, by pricing rentals appropriately. We propose a bidding approach that is inspired from auctions and takes into account the significant uncertainty inherent in the problem data (e.g., pick-up and drop-off locations, time of requests, and duration of trips). Specifically, in contrast to current vehicle sharing systems, the operator does not set prices. Instead, customers submit bids and the operator decides whether to rent or not. The operator can even accept negative bids to motivate drivers to rebalance available cars to unpopular destinations within a city. We model the operator's sequential decision-making problem as a \emph{constrained Markov decision problem} (CMDP) and propose and rigorously analyze a novel two phase $Q$-learning algorithm for its solution. Numerical experiments are presented and discussed.

[1]  Elise Miller-Hooks,et al.  Large-Scale Vehicle Sharing Systems: Analysis of Vélib' , 2013 .

[2]  Marco Pavone,et al.  Control of robotic mobility-on-demand systems: A queueing-theoretical perspective , 2014, Int. J. Robotics Res..

[3]  Lihong Li,et al.  PAC model-free reinforcement learning , 2006, ICML.

[4]  E. Altman Constrained Markov Decision Processes , 1999 .

[5]  Sarit Kraus,et al.  Towards a formalization of teamwork with resource constraints , 2004, Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, 2004. AAMAS 2004..

[6]  Daniel Egloff Monte Carlo algorithms for optimal stopping and statistical learning , 2004, math/0408276.

[7]  Marco Pavone,et al.  A queueing network approach to the analysis and control of mobility-on-demand systems , 2014, 2015 American Control Conference (ACC).

[8]  Sheldon M. Ross,et al.  Stochastic Processes , 2018, Gauge Integral Structures for Stochastic Calculus and Quantum Electrodynamics.

[9]  Kentaro Uesugi,et al.  Optimization of Vehicle Assignment for Car Sharing System , 2007, KES.

[10]  Lei Ying,et al.  On Combining Shortest-Path and Back-Pressure Routing Over Multihop Wireless Networks , 2011, IEEE/ACM Transactions on Networking.

[11]  Tim Roughgarden,et al.  Algorithmic Game Theory , 2007 .

[12]  Michael J. Neely,et al.  Opportunistic Scheduling with Reliability Guarantees in Cognitive Radio Networks , 2009, IEEE Trans. Mob. Comput..

[13]  Shalabh Bhatnagar,et al.  An actor-critic algorithm with function approximation for discounted cost constrained Markov decision processes , 2010, Syst. Control. Lett..

[14]  Elise Miller-Hooks,et al.  Fleet Management for Vehicle Sharing Operations , 2011, Transp. Sci..

[15]  V. A. Epanechnikov Non-Parametric Estimation of a Multivariate Probability Density , 1969 .

[16]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[17]  Csaba Szepesvári,et al.  Multi-criteria Reinforcement Learning , 1998, ICML.

[18]  Chris K. Anderson,et al.  Setting Prices on Priceline , 2009, Interfaces.

[19]  Michael Kearns,et al.  Finite-Sample Convergence Rates for Q-Learning and Indirect Algorithms , 1998, NIPS.

[20]  António Pais Antunes,et al.  Optimization Approach to Depot Location and Trip Selection in One-Way Carsharing Systems , 2012 .

[21]  Mauro Dell'Amico,et al.  The bike sharing rebalancing problem: Mathematical formulations and benchmark instances , 2014 .

[22]  Joohwan Kim,et al.  A Stochastic Process Model for Daily Travel Patterns and Traffic Information , 2007, KES-AMSTA.

[23]  Günther R. Raidl,et al.  Balancing Bicycle Sharing Systems: Improving a VNS by Efficiently Determining Optimal Loading Operations , 2013, Hybrid Metaheuristics.

[24]  Jia Yuan Yu,et al.  Real-time Bidding based Vehicle Sharing , 2015, AAMAS.

[25]  William J. Mitchell,et al.  Reinventing the Automobile: Personal Urban Mobility for the 21st Century , 2010 .

[26]  Mohammad Ghavamzadeh,et al.  Actor-Critic Algorithms for Risk-Sensitive MDPs , 2013, NIPS.

[27]  D. Bertsekas,et al.  A Least Squares Q-Learning Algorithm for Optimal Stopping Problems , 2007 .

[28]  Matthew Barth,et al.  Simulation model performance analysis of a multiple station shared vehicle system , 1999 .

[29]  Tal Raviv,et al.  Static repositioning in a bike-sharing system: models and solution approaches , 2013, EURO J. Transp. Logist..

[30]  Vivek S. Borkar,et al.  An actor-critic algorithm for constrained Markov decision processes , 2005, Syst. Control. Lett..

[31]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[32]  Ruey Long Cheu,et al.  Relocation Simulation Model for Multiple-Station Shared-Use Vehicle Systems: , 2006 .

[33]  Randy B Machemehl,et al.  Carsharing: Dynamic Decision-Making Problem for Vehicle Allocation , 2008 .

[34]  D. Papanikolaou The Market Economy of Trips , 2011 .

[35]  Leandros Tassiulas,et al.  Resource Allocation and Cross-Layer Control in Wireless Networks , 2006, Found. Trends Netw..