Modeling Cyber-Physical Human Systems via an Interplay Between Reinforcement Learning and Game Theory

Predicting the outcomes of cyber-physical systems with multiple human interactions is a challenging problem. This article reviews a game theoretical approach to address this issue, where reinforcement learning is employed to predict the time-extended interaction dynamics. We explain that the most attractive feature of the method is proposing a computationally feasible approach to simultaneously model multiple humans as decision makers, instead of determining the decision dynamics of the intelligent agent of interest and forcing the others to obey certain kinematic and dynamic constraints imposed by the environment. We present two recent exploitations of the method to model 1) unmanned aircraft integration into the National Airspace System and 2) highway traffic. We conclude the article by providing ongoing and future work about employing, improving and validating the method. We also provide related open problems and research opportunities.

[1]  Chris Watkins,et al.  Learning from delayed rewards , 1989 .

[2]  Martin A. Riedmiller,et al.  Improved neural fitted Q iteration applied to a novel computer gaming and learning benchmark , 2011, 2011 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL).

[3]  Francesco Borrelli,et al.  Automated driving: The role of forecasts and uncertainty - A control perspective , 2015, Eur. J. Control.

[4]  James K. Kuchar,et al.  The Traffic Alert and Collision Avoidance System , 2007 .

[5]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[6]  Guillaume Brat,et al.  Predicting Pilot Behavior in Medium Scale Scenarios Using Game Theory and Reinforcement Learning , 2013 .

[7]  M. DeGarmo Issues Concerning Integration of Unmanned Aerial Vehicles in Civil Airspace November 2004 , 2004 .

[8]  Mo Chen,et al.  Safe platooning of unmanned aerial vehicles via reachability , 2015, 2015 54th IEEE Conference on Decision and Control (CDC).

[9]  Lucian Busoniu,et al.  Reinforcement learning for control: Performance, stability, and deep approximators , 2018, Annu. Rev. Control..

[10]  Yildiray Yildiz,et al.  A 3D Game Theoretical Framework for the Evaluation of Unmanned Aircraft Systems Airspace Integration Concepts , 2018, ArXiv.

[11]  Ruzena Bajcsy,et al.  Lane Keeping Assistance with Learning-Based Driver Model and Model Predictive Control , 2014 .

[12]  Michael I. Jordan,et al.  Reinforcement Learning Algorithm for Partially Observable Markov Decision Problems , 1994, NIPS.

[13]  Ruzena Bajcsy,et al.  Semiautonomous Vehicular Control Using Driver Modeling , 2014, IEEE Transactions on Intelligent Transportation Systems.

[14]  Toru Yamamoto,et al.  Design of a Vehicle Driver Model Based on Database-Driven Control Approach , 2018, 2018 IEEE Conference on Control Technology and Applications (CCTA).

[15]  Ilya V. Kolmanovsky,et al.  A game theoretical model of traffic with multiple interacting drivers for use in autonomous vehicle development , 2016, 2016 American Control Conference (ACC).

[16]  Mo Chen,et al.  Provably Safe and Robust Drone Routing via Sequential Path Planning: A Case Study in San Francisco and the Bay Area , 2017, ArXiv.

[17]  Amy R. Pritchett The System Safety Perspective , 2010 .

[18]  Colin Camerer Behavioral Game Theory: Experiments in Strategic Interaction , 2003 .

[19]  Marco Wiering,et al.  Reinforcement Learning , 2014, Adaptation, Learning, and Optimization.

[20]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[21]  Anuradha M. Annaswamy,et al.  Systems & Control for the future of humanity, research agenda: Current and future roles, impact and grand challenges , 2017, Annu. Rev. Control..

[22]  Ilya V. Kolmanovsky,et al.  Game Theory Controller for Hybrid Electric Vehicles , 2014, IEEE Transactions on Control Systems Technology.

[23]  V. Crawford Modeling Behavior in Novel Strategic Situation via Level-K Thinking , 2007 .

[24]  Rob Gray,et al.  A Two-Point Visual Control Model of Steering , 2004, Perception.

[25]  Ruzena Bajcsy,et al.  Safe semi-autonomous control with enhanced driver modeling , 2012, 2012 American Control Conference (ACC).

[26]  Russell Bent,et al.  Cyber-Physical Security: A Game Theory Model of Humans Interacting Over Control Systems , 2013, IEEE Transactions on Smart Grid.

[27]  Ilya V. Kolmanovsky,et al.  Hierarchical reasoning game theory based approach for evaluation and testing of autonomous vehicle control systems , 2016, 2016 IEEE 55th Conference on Decision and Control (CDC).

[28]  Miguel A. Costa-Gomes,et al.  Comparing Models of Strategic Thinking in Van Huyck, Battalio, and Beil’s Coordination Games , 2009 .

[29]  Mo Chen,et al.  Multi-vehicle collision avoidance via hamilton-jacobi reachability and mixed integer programming , 2016, 2016 IEEE 55th Conference on Decision and Control (CDC).

[30]  Nan Li,et al.  Stochastic Driver Modeling and Validation with Traffic Data , 2019, 2019 American Control Conference (ACC).

[31]  Martin A. Riedmiller,et al.  A direct adaptive method for faster backpropagation learning: the RPROP algorithm , 1993, IEEE International Conference on Neural Networks.

[32]  Guillaume Brat,et al.  Using Game Theoretic Models to Predict Pilot Behavior in NextGen Merging and Landing Scenario , 2012 .

[33]  Nan Li,et al.  Game Theoretic Modeling of Driver and Vehicle Interactions for Verification and Validation of Autonomous Vehicle Control Systems , 2016, IEEE Transactions on Control Systems Technology.

[34]  Claire J. Tomlin,et al.  Initial designs for an automatic forced landing system for safer inclusion of small unmanned air vehicles into the national airspace , 2016, 2016 IEEE/AIAA 35th Digital Avionics Systems Conference (DASC).

[35]  W. Conover A Kolmogorov Goodness-of-Fit Test for Discontinuous Distributions , 1972 .

[36]  Guillaume Brat,et al.  Predicting Pilot Behavior in Medium Scale Scenarios Using Game Theory and Reinforcement Learning , 2014 .

[37]  D. Stahl,et al.  On Players' Models of Other Players: Theory and Experimental Evidence , 1995 .

[38]  Pablo Royo Chic,et al.  A taxonomy of UAS separation maneuvers and their automated execution , 2012 .

[39]  David H. Wolpert,et al.  Game Theoretic Modeling of Pilot Behavior During Mid-Air Encounters , 2011, ArXiv.

[40]  Deniz Onural,et al.  Unmanned Aircraft Systems Airspace Integration: A Game Theoretical Framework for Concept Evaluations , 2017 .

[41]  Richard Melnyk A Demonstration of Reliability and Certification Standards for Unmanned Aircraft System Control Links , 2019 .

[42]  Tim Hall,et al.  A Safety Analysis Process for the Traffic Alert and Collision Avoidance System (TCAS) and See-and-Avoid Systems on Remotely Piloted Vehicles , 2004 .

[43]  Huei Peng,et al.  An adaptive lateral preview driver model , 2005 .

[44]  Martin A. Riedmiller Neural Fitted Q Iteration - First Experiences with a Data Efficient Neural Reinforcement Learning Method , 2005, ECML.

[45]  Russell Bent,et al.  Counter-Factual Reinforcement Learning: How to Model Decision-Makers That Anticipate the Future , 2012, Decision Making and Imperfection.

[46]  Averill M. Law,et al.  How to build valid and credible simulation models , 2008, 2008 Winter Simulation Conference.

[47]  Reza Langari,et al.  A Stackelberg Game Theoretic Driver Model for Merging , 2013 .

[48]  Helbing,et al.  Congested traffic states in empirical observations and microscopic simulations , 2000, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.

[49]  Francesco Borrelli,et al.  Autonomous car following: A learning-based approach , 2015, 2015 IEEE Intelligent Vehicles Symposium (IV).

[50]  Kyle Noth,et al.  Dynamic Protection Zone Alerting and Pilot Maneuver Logic for Ground Based Sense and Avoid of Unmanned Aircraft Systems , 2012, Infotech@Aerospace.

[51]  Richard R. Schultz,et al.  Unmanned Aircraft Systems Sense and Avoid Flight Testing Utilizing ADS-B Transceiver , 2010 .

[52]  D. Casanova,et al.  A Mathematical Model for Driver Steering Control, with Design, Tuning and Performance Results , 2000 .

[53]  Gopinath Rebala,et al.  Reinforcement Learning Algorithms , 2019, An Introduction to Machine Learning.

[54]  Reza Langari,et al.  Stackelberg Game Based Model of Highway Driving , 2012 .

[55]  A. Modjtahedzadeh,et al.  A control theoretic model of driver steering behavior , 1990, IEEE Control Systems Magazine.

[56]  Peter Hidas,et al.  MODELLING LANE CHANGING AND MERGING IN MICROSCOPIC TRAFFIC SIMULATION , 2002 .

[57]  Giancarmine Fasano,et al.  Multi-Sensor-Based Fully Autonomous Non-Cooperative Collision Avoidance System for Unmanned Air Vehicles , 2008, J. Aerosp. Comput. Inf. Commun..

[58]  Duminda Wijesekera,et al.  ADS-Bsec: A Holistic Framework to Secure ADS-B , 2018, IEEE Transactions on Intelligent Vehicles.

[59]  Mathias Perrollaz,et al.  Learning-based approach for online lane change intention prediction , 2013, 2013 IEEE Intelligent Vehicles Symposium (IV).

[60]  Mykel J. Kochenderfer,et al.  Correlated Encounter Model for Cooperative Aircraft in the National Airspace System Version 1.0 , 2008 .

[61]  Thomas B Billingsley Safety Analysis of TCAS on Global Hawk Using Airspace Encounter Models , 2006 .

[62]  Erwin R. Boer,et al.  Toward an Integrated Model of Driver Behavior in Cognitive Architecture , 2001 .

[63]  Lynne Martin,et al.  Pilot and Controller Evaluations of Separation Function Allocation in Air Traffic Management , 2013 .

[64]  Kimon P. Valavanis,et al.  On unmanned aircraft systems issues, challenges and operational restrictions preventing integration into the National Airspace System , 2008 .

[65]  Martin A. Riedmiller,et al.  Learning to Drive a Real Car in 20 Minutes , 2007, 2007 Frontiers in the Convergence of Bioscience and Information Technologies.

[66]  R. Padhi,et al.  Reactive Collision Avoidance of Using Nonlinear Geometric and Differential Geometric Guidance , 2011 .

[67]  Thomas B Sheridan,et al.  Final Report and Recommendations for Research on Human-Automation Interaction in the Next Generation Air Transportation System , 2006 .

[68]  Mo Chen,et al.  Reachability-Based Safety and Goal Satisfaction of Unmanned Aerial Platoons on Air Highways , 2016, 1602.08150.