InBEDE: Integrating Contextual Bandit with TD Learning for Joint Pricing and Dispatch of Ride-Hailing Platforms
暂无分享,去创建一个
Bo An | Jieping Ye | Hongtu Zhu | Hao Li | Haipeng Chen | Xiaocheng Tang | Zhiwei Qin | Yan Jiao | Jieping Ye | Hao Li | Zhiwei Qin | Xiaocheng Tang | Bo An | Hongtu Zhu | Haipeng Chen | Yan Jiao
[1] Pingzhong Tang,et al. Optimal Vehicle Dispatching Schemes via Dynamic Pricing , 2017, ArXiv.
[2] D. Woodard,et al. Dynamic pricing and matching in ride‐hailing platforms , 2018, Naval Research Logistics (NRL).
[3] Zhe Xu,et al. A Deep Value-network Based Approach for Multi-Driver Order Dispatching , 2019, KDD.
[4] Andreas Krause,et al. Information-Theoretic Regret Bounds for Gaussian Process Optimization in the Bandit Setting , 2009, IEEE Transactions on Information Theory.
[5] Shipra Agrawal,et al. Thompson Sampling for Contextual Bandits with Linear Payoffs , 2012, ICML.
[6] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[7] R. Johari,et al. Pricing in Ride-Share Platforms: A Queueing-Theoretic Approach , 2015 .
[8] Jieping Ye,et al. Deep Reinforcement Learning with Knowledge Transfer for Online Rides Order Dispatching , 2018, 2018 IEEE International Conference on Data Mining (ICDM).
[9] Wei Chu,et al. A contextual-bandit approach to personalized news article recommendation , 2010, WWW '10.
[10] J. Munkres. ALGORITHMS FOR THE ASSIGNMENT AND TRANSIORTATION tROBLEMS* , 1957 .
[11] Thodoris Lykouris,et al. Pricing and Optimization in Shared Vehicle Systems: An Approximation Framework , 2016, EC.
[12] Jieping Ye,et al. A Taxi Order Dispatch Model based On Combinatorial Optimization , 2017, KDD.
[13] Carlos Riquelme,et al. Pricing in Ride-Sharing Platforms: A Queueing-Theoretic Approach , 2015, EC.
[14] Ziqi Liao,et al. Real-time taxi dispatching using Global Positioning Systems , 2003, CACM.
[15] Raphaël Féraud,et al. A Neural Networks Committee for the Contextual Bandit Problem , 2014, ICONIP.
[16] Arthur E. Hoerl,et al. Ridge Regression: Biased Estimation for Nonorthogonal Problems , 2000, Technometrics.
[17] Shuai Li,et al. Distributed Clustering of Linear Bandits in Peer to Peer Networks , 2016, ICML.
[18] Shuai Li,et al. Collaborative Filtering Bandits , 2015, SIGIR.
[19] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[20] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.
[21] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.
[22] Steven L. Scott,et al. A modern Bayesian look at the multi-armed bandit , 2010 .
[23] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.
[24] Nello Cristianini,et al. Finite-Time Analysis of Kernelised Contextual Bandits , 2013, UAI.
[25] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[26] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[27] M. Keith Chen,et al. Dynamic Pricing in a Labor Market: Surge Pricing and Flexible Work on the Uber Platform , 2016, EC.
[28] Dawn B. Woodard,et al. Dynamic pricing and matching in ride‐hailing platforms , 2019, Naval Research Logistics (NRL).
[29] Christopher S. Tang,et al. Coordinating Supply and Demand on an On-Demand Service Platform with Impatient Customers , 2017, Manuf. Serv. Oper. Manag..
[30] Jun Wang,et al. Efficient Ridesharing Order Dispatching with Mean Field Multi-Agent Reinforcement Learning , 2019, WWW.
[31] Marco Pavone,et al. Control of robotic mobility-on-demand systems: A queueing-theoretical perspective , 2014, Int. J. Robotics Res..
[32] A. Burnetas,et al. Optimal Adaptive Policies for Sequential Allocation Problems , 1996 .
[33] Steven D. Levitt,et al. Using Big Data to Estimate Consumer Surplus: The Case of Uber , 2016 .
[34] E. Glen Weyl,et al. Surge Pricing Solves the Wild Goose Chase , 2017, EC.
[35] Zhe Xu,et al. Large-Scale Order Dispatch in On-Demand Ride-Hailing Platforms: A Learning and Planning Approach , 2018, KDD.
[36] H. Robbins,et al. Asymptotically efficient adaptive allocation rules , 1985 .
[37] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.
[38] Hai Yang,et al. Nonlinear pricing of taxi services , 2010 .
[39] David C. Parkes,et al. Spatio-Temporal Pricing for Ridesharing Platforms , 2018, EC.