Flow: Architecture and Benchmarking for Reinforcement Learning in Traffic Control

Flow is a new computational framework, built to support a key need triggered by the rapid growth of autonomy in ground traffic: controllers for autonomous vehicles in the presence of complex nonlinear dynamics in traffic. Leveraging recent advances in deep Reinforcement Learning (RL), Flow enables the use of RL methods such as policy gradient for traffic control and enables benchmarking the performance of classical (including hand-designed) controllers with learned policies (control laws). Flow integrates traffic microsimulator SUMO with deep reinforcement learning library rllab and enables the easy design of traffic tasks, including different networks configurations and vehicle dynamics. We use Flow to develop reliable controllers for complex problems, such as controlling mixed-autonomy traffic (involving both autonomous and human-driven vehicles) in a ring road. For this, we first show that state-of-the-art hand-designed controllers excel when in-distribution, but fail to generalize; then, we show that even simple neural network policies can solve the stabilization task across density settings and generalize to out-of-distribution settings.

[1]  M J Lighthill,et al.  On kinematic waves II. A theory of traffic flow on long crowded roads , 1955, Proceedings of the Royal Society of London. Series A. Mathematical and Physical Sciences.

[2]  P. I. Richards Shock Waves on the Highway , 1956 .

[3]  R. Bellman A Markovian Decision Process , 1957 .

[4]  Ronald A. Howard,et al.  Dynamic Programming and Markov Processes , 1960 .

[5]  Ronald A. Howard,et al.  Dynamic Programming and Markov Processes , 1960 .

[6]  P. G. Gipps,et al.  A behavioural car-following model for computer simulation , 1981 .

[7]  P Spaulding,et al.  NATIONAL TRANSPORTATION STATISTICS , 1983 .

[8]  Panos G. Michalopoulos,et al.  Multilane traffic flow dynamics: Some macroscopic considerations , 1984 .

[9]  A. Fuze,et al.  Reconstruction of 3-D Road Geometry from Images for Autonomous Land Vehicles , 1990 .

[10]  Zvi Shiller,et al.  Dynamic motion planning of autonomous vehicles , 1991, IEEE Trans. Robotics Autom..

[11]  Petros A. Ioannou,et al.  Autonomous intelligent cruise control , 1993 .

[12]  S. Hyakin,et al.  Neural Networks: A Comprehensive Foundation , 1994 .

[13]  K. Hasebe,et al.  Structure stability of congestion in traffic dynamics , 1994 .

[14]  Nakayama,et al.  Dynamical model of traffic congestion and numerical simulation. , 1995, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.

[15]  Sergey V. Drakunov,et al.  ABS control using optimum search via sliding modes , 1995, IEEE Trans. Control. Syst. Technol..

[16]  D. Swaroop,et al.  String Stability Of Interconnected Systems: An Application To Platooning In Automated Highway Systems , 1997 .

[17]  Stefan Schaal,et al.  Robot Learning From Demonstration , 1997, ICML.

[18]  Axel Klar,et al.  A Hierarchy of Models for Multilane Vehicular Traffic I: Modeling , 1998, SIAM J. Appl. Math..

[19]  Hugh F. Durrant-Whyte,et al.  A high integrity IMU/GPS navigation loop for autonomous land vehicle applications , 1999, IEEE Trans. Robotics Autom..

[20]  Carlos F. Daganzo,et al.  A BEHAVIORAL THEORY OF MULTI-LANE TRAFFIC FLOW. PART I, LONG HOMOGENEOUS FREEWAY SECTIONS , 1999 .

[21]  Mike McDonald,et al.  Car-following: a historical review , 1999 .

[22]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[23]  Markos Papageorgiou,et al.  OPTIMAL COORDINATED AND INTEGRATED MOTORWAY NETWORK TRAFFIC CONTROL , 1999 .

[24]  Huei Peng,et al.  String stability analysis of adaptive cruise controlled vehicles , 2000 .

[25]  Helbing,et al.  Congested traffic states in empirical observations and microscopic simulations , 2000, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.

[26]  Hugh F. Durrant-Whyte,et al.  A solution to the simultaneous localization and map building (SLAM) problem , 2001, IEEE Trans. Robotics Autom..

[27]  Sebastian Thrun,et al.  Probabilistic robotics , 2002, CACM.

[28]  A. Sasoh,et al.  Shock Wave Relation Containing Lane Change Source Term for Two-Lane Traffic Flow , 2002 .

[29]  Rajesh Rajamani,et al.  Semi-autonomous adaptive cruise control systems , 2002, IEEE Trans. Veh. Technol..

[30]  Shuzhi Sam Ge,et al.  Autonomous vehicle positioning with GPS in urban canyon environments , 2001, IEEE Trans. Robotics Autom..

[31]  Petros A. Ioannou,et al.  Analysis of traffic flow with mixed manual and semiautomated vehicles , 2003, IEEE Trans. Intell. Transp. Syst..

[32]  Keith Redmill,et al.  Automated lane change controller design , 2003, IEEE Trans. Intell. Transp. Syst..

[33]  Wang,et al.  Review of road traffic control strategies , 2003, Proceedings of the IEEE.

[34]  Azim Eskandarian,et al.  Research advances in intelligent collision avoidance and adaptive cruise control , 2003, IEEE Trans. Intell. Transp. Syst..

[35]  J.K. Hedrick,et al.  Heavy-duty truck control: short inter-vehicle distance following , 2004, Proceedings of the 2004 American Control Conference.

[36]  Alexander Skabardonis,et al.  Traffic Analysis Toolbox Volume III: Guidelines for Applying Traffic Microsimulation Modeling Software , 2004 .

[37]  Gordon D. B. Cameron,et al.  PARAMICS—Parallel microscopic simulation of road traffic , 1996, The Journal of Supercomputing.

[38]  Pieter Abbeel,et al.  Apprenticeship learning via inverse reinforcement learning , 2004, ICML.

[39]  S E Shladover,et al.  Automated vehicles for highway operations (automated highway systems) , 2005 .

[40]  Petros A. Ioannou,et al.  Evaluation of ACC vehicles in mixed traffic: lane change effects and sensitivity analysis , 2005, IEEE Transactions on Intelligent Transportation Systems.

[41]  Christos Dimitrakakis,et al.  TORCS, The Open Racing Car Simulator , 2005 .

[42]  Mauro Garavello,et al.  Traffic flow on networks : conservation laws models , 2006 .

[43]  Bart van Arem,et al.  The Impact of Cooperative Adaptive Cruise Control on Traffic-Flow Characteristics , 2006, IEEE Transactions on Intelligent Transportation Systems.

[44]  Y. Sugiyama,et al.  Traffic jams without bottlenecks—experimental evidence for the physical mechanism of the formation of a jam , 2008 .

[45]  Mao-Bin Hu,et al.  Traffic Flow Characteristics in a Mixed Traffic System Consisting of ACC Vehicles and Manual Vehicles: A Hybrid Modeling Approach , 2009 .

[46]  Peter Stone,et al.  A Multiagent Approach to Autonomous Intersection Management , 2008, J. Artif. Intell. Res..

[47]  Alexander Skabardonis,et al.  Oversaturated Freeway Flow Algorithm for Use in Next Generation Simulation , 2008 .

[48]  Javier Minguez,et al.  Extending Collision Avoidance Methods to Consider the Vehicle Shape, Kinematics, and Dynamics of a Mobile Robot , 2009, IEEE Transactions on Robotics.

[49]  Josep Perarnau,et al.  Traffic Simulation with Aimsun , 2010 .

[50]  Peter Vortisch,et al.  Microscopic Traffic Flow Simulator VISSIM , 2010 .

[51]  Gábor Stépán,et al.  Traffic jams: dynamics and control , 2010, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[52]  Elen Twrdy,et al.  Optimal velocity functions for car-following models , 2010 .

[53]  Gábor Orosz,et al.  Delayed car-following dynamics for human and robotic drivers , 2011 .

[54]  Eleni I. Vlahogianni,et al.  Statistical methods versus neural networks in transportation research: Differences, similarities and some insights , 2011 .

[55]  Daniel Krajzewicz,et al.  Recent Development and Applications of SUMO - Simulation of Urban MObility , 2012 .

[56]  Vicente Milanés Montero,et al.  Intelligent automatic overtaking system using vision for vehicle detection , 2012, Expert Syst. Appl..

[57]  Emilio Frazzoli,et al.  Robotic load balancing for mobility-on-demand systems , 2012, Int. J. Robotics Res..

[58]  Stephen F. Smith,et al.  Schedule-driven intersection control , 2012 .

[59]  Yuval Tassa,et al.  MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[60]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[61]  Berthold K. P. Horn,et al.  Suppressing traffic flow instabilities , 2013, 16th International IEEE Conference on Intelligent Transportation Systems (ITSC 2013).

[62]  Susan Shaheen,et al.  Dynamic Ecodriving in Northern California: Study of Survey and Vehicle Operations Data from Ecodriving Feedback Device , 2013 .

[63]  Martin Treiber,et al.  Traffic Flow Dynamics , 2013 .

[64]  Gábor Orosz,et al.  Dynamics of connected vehicle systems with delayed acceleration feedback , 2014 .

[65]  Emilio Frazzoli,et al.  Toward a Systematic Approach to the Design and Evaluation of Automated Mobility-on-Demand Systems: A Case Study in Singapore , 2014 .

[66]  Kyongsu Yi,et al.  Lane-keeping assistance control algorithm using differential braking to prevent unintended lane departures , 2014 .

[67]  Zuduo Zheng,et al.  Recent developments and research needs in modeling lane changing , 2014 .

[68]  Ruzena Bajcsy,et al.  Lane Keeping Assistance with Learning-Based Driver Model and Model Predictive Control , 2014 .

[69]  Jun-ichi Imura,et al.  Smart Driving of a Vehicle Using Model Predictive Control for Improving Traffic Flow , 2014, IEEE Transactions on Intelligent Transportation Systems.

[70]  Sertac Karaman,et al.  Polling-systems-based control of high-performance provably-safe autonomous intersections , 2014, 53rd IEEE Conference on Decision and Control.

[71]  Matthew Lai,et al.  Giraffe: Using Deep Reinforcement Learning to Play Chess , 2015, ArXiv.

[72]  Alberto Speranzon,et al.  Multiobjective Path Planning: Localization Constraints and Collision Probability , 2015, IEEE Transactions on Robotics.

[73]  Yuval Tassa,et al.  Learning Continuous Control Policies by Stochastic Value Gradients , 2015, NIPS.

[74]  Sergey Levine,et al.  Trust Region Policy Optimization , 2015, ICML.

[75]  Chung Choo Chung,et al.  Robust Multirate Control Scheme With Predictive Virtual Lanes for Lane-Keeping System of Autonomous Highway Driving , 2015, IEEE Transactions on Vehicular Technology.

[76]  Fei-Yue Wang,et al.  Traffic Flow Prediction With Big Data: A Deep Learning Approach , 2015, IEEE Transactions on Intelligent Transportation Systems.

[77]  Joshua Auld,et al.  POLARIS: Agent-based modeling framework development and implementation for integrated travel demand and network and operations simulations , 2016 .

[78]  Yoshua Bengio,et al.  Gated Feedback Recurrent Neural Networks , 2015, ICML.

[79]  Marc G. Bellemare,et al.  The Arcade Learning Environment: An Evaluation Platform for General Agents (Extended Abstract) , 2012, IJCAI.

[80]  Jakob Erdmann,et al.  SUMO’s Lane-Changing Model , 2015 .

[81]  Xin Zhang,et al.  End to End Learning for Self-Driving Cars , 2016, ArXiv.

[82]  Pieter Abbeel,et al.  Benchmarking Deep Reinforcement Learning for Continuous Control , 2016, ICML.

[83]  Kay W. Axhausen,et al.  The Multi-Agent Transport Simulation , 2016 .

[84]  Marco Pavone,et al.  Control of robotic mobility-on-demand systems: A queueing-theoretical perspective , 2014, Int. J. Robotics Res..

[85]  Jinwoo Lee,et al.  A probability model for discretionary lane changes in highways , 2016 .

[86]  Shane Legg,et al.  DeepMind Lab , 2016, ArXiv.

[87]  Anca D. Dragan,et al.  Planning for Autonomous Cars that Leverage Effects on Human Actions , 2016, Robotics: Science and Systems.

[88]  Don MacKenzie,et al.  Help or hindrance? The travel, energy and carbon impacts of highly automated vehicles , 2016 .

[89]  Sergey Levine,et al.  High-Dimensional Continuous Control Using Generalized Advantage Estimation , 2015, ICLR.

[90]  Florian Richoux,et al.  TorchCraft: a Library for Machine Learning Research on Real-Time Strategy Games , 2016, ArXiv.

[91]  Wojciech Zaremba,et al.  OpenAI Gym , 2016, ArXiv.

[92]  Ion Stoica,et al.  Ray RLLib: A Composable and Scalable Reinforcement Learning Library , 2017, NIPS 2017.

[93]  Liang Wang,et al.  Eigenvalue and Eigenvector Analysis of Stability for a Line of Traffic , 2017 .

[94]  Maria Laura Delle Monache,et al.  Dissipation of stop-and-go waves via control of autonomous vehicles: Field experiments , 2017, ArXiv.

[95]  Alexandre M. Bayen,et al.  Framework for control and deep reinforcement learning in traffic , 2017, 2017 IEEE 20th International Conference on Intelligent Transportation Systems (ITSC).

[96]  Nicholas G. Polson,et al.  Deep learning for short-term traffic flow prediction , 2016, 1604.04527.

[97]  Yi Wu,et al.  Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments , 2017, NIPS.

[98]  Alexandre M. Bayen,et al.  Multi-lane reduction: A stochastic single-lane model for lane changing , 2017, 2017 IEEE 20th International Conference on Intelligent Transportation Systems (ITSC).

[99]  Andreas A. Malikopoulos,et al.  Automated and Cooperative Vehicle Merging at Highway On-Ramps , 2017, IEEE Transactions on Intelligent Transportation Systems.

[100]  Andreas A. Malikopoulos,et al.  A Survey on the Coordination of Connected and Automated Vehicles at Intersections and Merging at Highway On-Ramps , 2017, IEEE Transactions on Intelligent Transportation Systems.

[101]  Martin Treiber,et al.  The Intelligent Driver Model with Stochasticity -New Insights Into Traffic Flow Oscillations , 2017 .

[102]  Alexandre M. Bayen,et al.  Emergent Behaviors in Mixed-Autonomy Traffic , 2017, CoRL.

[103]  Le Song,et al.  2 Common Formulation for Greedy Algorithms on Graphs , 2018 .

[104]  Alexandre M. Bayen,et al.  Benchmarks for reinforcement learning in mixed-autonomy traffic , 2018, CoRL.

[105]  Alexandre M. Bayen,et al.  Dissipating stop-and-go waves in closed and open networks via deep reinforcement learning , 2018, 2018 21st International Conference on Intelligent Transportation Systems (ITSC).

[106]  Matthew J. Hausknecht,et al.  TextWorld: A Learning Environment for Text-based Games , 2018, CGW@IJCAI.

[107]  Alexandre M. Bayen,et al.  Lagrangian Control through Deep-RL: Applications to Bottleneck Decongestion , 2018, 2018 21st International Conference on Intelligent Transportation Systems (ITSC).

[108]  Alexandre M. Bayen,et al.  Expert Level Control of Ramp Metering Based on Multi-Task Deep Reinforcement Learning , 2017, IEEE Transactions on Intelligent Transportation Systems.

[109]  Zhenhui Li,et al.  IntelliLight: A Reinforcement Learning Approach for Intelligent Traffic Light Control , 2018, KDD.

[110]  Alexandre M. Bayen,et al.  Stabilizing Traffic with Autonomous Vehicles , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).