论文信息 - Two Phase $Q-$learning for Bidding-based Vehicle Sharing

Two Phase $Q-$learning for Bidding-based Vehicle Sharing

We consider one-way vehicle sharing systems where customers can rent a car at one station and drop it off at another. The problem we address is to optimize the distribution of cars, and quality of service, by pricing rentals appropriately. We propose a bidding approach that is inspired from auctions and takes into account the significant uncertainty inherent in the problem data (e.g., pick-up and drop-off locations, time of requests, and duration of trips). Specifically, in contrast to current vehicle sharing systems, the operator does not set prices. Instead, customers submit bids and the operator decides whether to rent or not. The operator can even accept negative bids to motivate drivers to rebalance available cars to unpopular destinations within a city. We model the operator's sequential decision-making problem as a \emph{constrained Markov decision problem} (CMDP) and propose and rigorously analyze a novel two phase $Q$-learning algorithm for its solution. Numerical experiments are presented and discussed.

Marco Pavone | Jia Yuan Yu | Yinlam Chow | M. Pavone | Yinlam Chow

[1] Elise Miller-Hooks,et al. Large-Scale Vehicle Sharing Systems: Analysis of Vélib' , 2013 .

[2] Marco Pavone,et al. Control of robotic mobility-on-demand systems: A queueing-theoretical perspective , 2014, Int. J. Robotics Res..

[3] Lihong Li,et al. PAC model-free reinforcement learning , 2006, ICML.

[4] E. Altman. Constrained Markov Decision Processes , 1999 .

[5] Sarit Kraus,et al. Towards a formalization of teamwork with resource constraints , 2004, Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, 2004. AAMAS 2004..

[6] Daniel Egloff. Monte Carlo algorithms for optimal stopping and statistical learning , 2004, math/0408276.

[7] Marco Pavone,et al. A queueing network approach to the analysis and control of mobility-on-demand systems , 2014, 2015 American Control Conference (ACC).

[8] Sheldon M. Ross,et al. Stochastic Processes , 2018, Gauge Integral Structures for Stochastic Calculus and Quantum Electrodynamics.

[9] Kentaro Uesugi,et al. Optimization of Vehicle Assignment for Car Sharing System , 2007, KES.

[10] Lei Ying,et al. On Combining Shortest-Path and Back-Pressure Routing Over Multihop Wireless Networks , 2011, IEEE/ACM Transactions on Networking.

[11] Tim Roughgarden,et al. Algorithmic Game Theory , 2007 .

[12] Michael J. Neely,et al. Opportunistic Scheduling with Reliability Guarantees in Cognitive Radio Networks , 2009, IEEE Trans. Mob. Comput..

[13] Shalabh Bhatnagar,et al. An actor-critic algorithm with function approximation for discounted cost constrained Markov decision processes , 2010, Syst. Control. Lett..

[14] Elise Miller-Hooks,et al. Fleet Management for Vehicle Sharing Operations , 2011, Transp. Sci..

[15] V. A. Epanechnikov. Non-Parametric Estimation of a Multivariate Probability Density , 1969 .

[16] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[17] Csaba Szepesvári,et al. Multi-criteria Reinforcement Learning , 1998, ICML.