DyETC: Dynamic Electronic Toll Collection for Traffic Congestion Alleviation

To alleviate traffic congestion in urban areas, electronic toll collection (ETC) systems are deployed all over the world. Despite the merits, tolls are usually pre-determined and fixed from day to day, which fail to consider traffic dynamics and thus have limited regulation effect when traffic conditions are abnormal. In this paper, we propose a novel dynamic ETC (DyETC) scheme which adjusts tolls to traffic conditions in realtime. The DyETC problem is formulated as a Markov decision process (MDP), the solution of which is very challenging due to its 1) multi-dimensional state space, 2) multi-dimensional, continuous and bounded action space, and 3) time-dependent state and action values. Due to the complexity of the formulated MDP, existing methods cannot be applied to our problem. Therefore, we develop a novel algorithm, PG-β, which makes three improvements to traditional policy gradient method by proposing 1) timedependent value and policy functions, 2) Beta distribution policy function and 3) state abstraction. Experimental results show that, compared with existing ETC schemes, DyETC increases traffic volume by around 8%, and reduces travel time by around 14.6% during rush hour. Considering the total traffic volume in a traffic network, this contributes to a substantial increase to social welfare.

[1]  紺野 晃 シンガポ-ルにおけるElectronic Road Pricing(ERP)システム , 1998 .

[2]  Bo An,et al.  Optimal Pricing for Improving Efficiency of Taxi Systems , 2013, IJCAI.

[3]  Bo An,et al.  Optimizing Efficiency of Taxi Systems: Scaling-up and Handling Arbitrary Constraints , 2015, AAMAS.

[4]  Hai-Jun Huang,et al.  A multiclass, multicriteria logit-based traffic equilibrium assignment model under ATIS , 2007, Eur. J. Oper. Res..

[5]  Andrew W. Moore,et al.  Prioritized sweeping: Reinforcement learning with less data and less time , 2004, Machine Learning.

[6]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[7]  Roberto Cominetti,et al.  Markovian traffic equilibrium , 2007, Math. Program..

[8]  Chunyan Miao,et al.  Optimal Pricing for Efficient Electric Vehicle Charging Station Management , 2016, AAMAS.

[9]  Stefan Schaal,et al.  2008 Special Issue: Reinforcement learning of motor skills with policy gradients , 2008 .

[10]  Hong Kam Lo,et al.  Modeling transfer and non-linear fare structure in multi-modal network , 2003 .

[11]  P.H.L. Bovy,et al.  Dynamic road pricing for optimizing network performance with heterogeneous users , 2005, Proceedings. 2005 IEEE Networking, Sensing and Control, 2005..

[12]  Rémi Coulom,et al.  Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search , 2006, Computers and Games.

[13]  Dimitris C. Dracopoulos,et al.  Application of Newton's Method to action selection in continuous state- and action-space reinforcement learning , 2014, ESANN.

[14]  Chunyan Miao,et al.  Optimal Electric Vehicle Charging Station Placement , 2015, IJCAI.

[15]  Ben J. A. Kröse,et al.  Learning from delayed rewards , 1995, Robotics Auton. Syst..

[16]  Chung-Cheng Lu,et al.  Dynamic pricing, heterogeneous users and perception error: Probit-based bi-criterion dynamic stochastic user equilibrium assignment , 2013 .

[17]  Hani S. Mahmassani,et al.  A bi-criterion dynamic user equilibrium traffic assignment model and solution algorithm for evaluating dynamic road pricing strategies , 2008 .

[18]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[19]  Nikolaus Hansen,et al.  The CMA Evolution Strategy: A Comparing Review , 2006, Towards a New Evolutionary Computation.

[20]  P Olszewski COMPARISON OF THE HCM AND SINGAPORE MODELS OF ARTERIAL CAPACITY , 2000 .

[21]  Sanjoy Dasgupta,et al.  Off-Policy Temporal Difference Learning with Function Approximation , 2001, ICML.

[22]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[23]  Peter Stone,et al.  Deep Reinforcement Learning in Parameterized Action Space , 2015, ICLR.

[24]  W. Y. Szeto,et al.  A methodology for sustainable traveler information services , 2002 .

[25]  Sergey Levine,et al.  Trust Region Policy Optimization , 2015, ICML.

[26]  Nahum Shimkin,et al.  Averaged-DQN: Variance Reduction and Stabilization for Deep Reinforcement Learning , 2016, ICML.

[27]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[28]  Shalabh Bhatnagar,et al.  Toward Off-Policy Learning Control with Function Approximation , 2010, ICML.

[29]  Stephen D. Boyles,et al.  Real-time Adaptive Tolling Scheme for Optimized Social Welfare in Traffic Networks , 2017, AAMAS 2017.

[30]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[31]  Sergey Levine,et al.  Continuous Deep Q-Learning with Model-based Acceleration , 2016, ICML.

[32]  Takashi Akamatsu,et al.  Cyclic flows, Markov process and stochastic traffic assignment , 1996 .