Planning for cars that coordinate with people: leveraging effects on human actions for planning and active information gathering over human internal state

Traditionally, autonomous cars treat human-driven vehicles like moving obstacles. They predict their future trajectories and plan to stay out of their way. While physically safe, this results in defensive and opaque behaviors. In reality, an autonomous car’s actions will actually affect what other cars will do in response, creating an opportunity for coordination. Our thesis is that we can leverage these responses to plan more efficient and communicative behaviors. We introduce a formulation of interaction with human-driven vehicles as an underactuated dynamical system, in which the robot’s actions have consequences on the state of the autonomous car, but also on the human actions and thus the state of the human-driven car. We model these consequences by approximating the human’s actions as (noisily) optimal with respect to some utility function. The robot uses the human actions as observations of her underlying utility function parameters. We first explore learning these parameters offline, and show that a robot planning in the resulting underactuated system is more efficient than when treating the person as a moving obstacle. We also show that the robot can target specific desired effects, like getting the person to switch lanes or to proceed first through an intersection. We then explore estimating these parameters online, and enable the robot to perform active information gathering: generating actions that purposefully probe the human in order to clarify their underlying utility parameters, like driving style or attention level. We show that this significantly outperforms passive estimation and improves efficiency. Planning in our model results in coordination behaviors: the robot inches forward at an intersection to see if can go through, or it reverses to make the other car proceed first. These behaviors result from the optimization, without relying on hand-coded signaling strategies. Our user studies support the utility of our model when interacting with real users.

[1]  Francesco Borrelli,et al.  INTEGRATED BRAKING AND STEERING MODEL PREDICTIVE CONTROL APPROACH IN AUTONOMOUS VEHICLES , 2007 .

[2]  William Whittaker,et al.  Autonomous driving in urban environments: Boss and the Urban Challenge , 2008, J. Field Robotics.

[3]  Anind K. Dey,et al.  Maximum Entropy Inverse Reinforcement Learning , 2008, AAAI.

[4]  Marko Bacic,et al.  Model predictive control , 2003 .

[5]  Ruzena Bajcsy,et al.  Safe semi-autonomous control with enhanced driver modeling , 2012, 2012 American Control Conference (ACC).

[6]  Stefanos Nikolaidis,et al.  Efficient Model Learning from Joint-Action Demonstrations for Human-Robot Collaborative Tasks , 2015, 2015 10th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[7]  Siddhartha S. Srinivasa,et al.  Formalizing human-robot mutual adaptation: A bounded memory model , 2016, 2016 11th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[8]  Andreas Krause,et al.  Unfreezing the robot: Navigation in dense, interacting crowds , 2010, 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[9]  Razvan Pascanu,et al.  Theano: A CPU and GPU Math Compiler in Python , 2010, SciPy.

[10]  Emilio Frazzoli,et al.  Sampling-based algorithms for continuous-time POMDPs , 2013, 2013 American Control Conference.

[11]  Allen Y. Yang,et al.  An efficient algorithm for discrete-time hidden mode stochastic hybrid systems , 2015, 2015 European Control Conference (ECC).

[12]  Francesco Borrelli,et al.  Predictive Active Steering Control for Autonomous Vehicle Systems , 2007, IEEE Transactions on Control Systems Technology.

[13]  Andreas Krause,et al.  Explore-exploit in top-N recommender systems via Gaussian processes , 2014, RecSys '14.

[14]  Andrew Y. Ng,et al.  Pharmacokinetics of a novel formulation of ivermectin after administration to goats , 2000, ICML.

[15]  Jianfeng Gao,et al.  Scalable training of L1-regularized log-linear models , 2007, ICML '07.

[16]  Carlos Bordons Alba,et al.  Model Predictive Control , 2012 .

[17]  Luke Fletcher,et al.  A perception‐driven autonomous urban vehicle , 2008, J. Field Robotics.

[18]  Neil Immerman,et al.  The Complexity of Decentralized Control of Markov Decision Processes , 2000, UAI.

[19]  Pieter Abbeel,et al.  Scaling up Gaussian Belief Space Planning Through Covariance-Free Trajectory Optimization and Automatic Differentiation , 2014, WAFR.

[20]  N. Roy,et al.  The Belief Roadmap: Efficient Planning in Belief Space by Factoring the Covariance , 2009, Int. J. Robotics Res..

[21]  Francesco Borrelli,et al.  Robust Predictive Control for semi-autonomous vehicles with an uncertain driver model , 2013, 2013 IEEE Intelligent Vehicles Symposium (IV).

[22]  Sriraam Natarajan,et al.  A Decision-Theoretic Model of Assistance , 2007, IJCAI.

[23]  Pieter Abbeel,et al.  Exploration and apprenticeship learning in reinforcement learning , 2005, ICML.

[24]  George J. Pappas,et al.  Information acquisition with sensing robots: Algorithms and error bounds , 2013, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[25]  Luke Fletcher,et al.  A perception‐driven autonomous urban vehicle , 2008, J. Field Robotics.

[26]  Anca D. Dragan,et al.  Information gathering actions over human internal state , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[27]  Andreas Krause,et al.  Robot navigation in dense human crowds: the case for cooperation , 2013, 2013 IEEE International Conference on Robotics and Automation.

[28]  J. Andrew Bagnell,et al.  Modeling Purposeful Adaptive Behavior with the Principle of Maximum Causal Entropy , 2010 .

[29]  Emilio Frazzoli,et al.  Intention-Aware Motion Planning , 2013, WAFR.

[30]  Francesco Borrelli,et al.  MPC-based yaw and lateral stabilisation via active front steering and braking , 2008 .

[31]  William Whittaker,et al.  Autonomous driving in urban environments: Boss and the Urban Challenge , 2008, J. Field Robotics.

[32]  Christoph Hermes,et al.  Long-term vehicle motion prediction , 2009, 2009 IEEE Intelligent Vehicles Symposium.

[33]  Anca D. Dragan,et al.  Planning for Autonomous Cars that Leverage Effects on Human Actions , 2016, Robotics: Science and Systems.

[34]  Nancy M. Amato,et al.  FIRM: Sampling-based feedback motion-planning under motion uncertainty and imperfect measurements , 2014, Int. J. Robotics Res..

[35]  Sanjit A. Seshia,et al.  Reactive synthesis from signal temporal logic specifications , 2015, HSCC.

[36]  Nikolay Atanasov,et al.  Active information acquisition with mobile robots , 2015 .

[37]  Shlomo Zilberstein,et al.  Dynamic Programming for Partially Observable Stochastic Games , 2004, AAAI.

[38]  Siddhartha S. Srinivasa,et al.  Efficient touch based localization through submodularity , 2012, 2013 IEEE International Conference on Robotics and Automation.

[39]  Surya P. N. Singh,et al.  An online and approximate solver for POMDPs with continuous action space , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[40]  J. How,et al.  Chance Constrained RRT for Probabilistic Robustness to Environmental Uncertainty , 2010 .

[41]  Andreas Krause,et al.  Robot navigation in dense human crowds: Statistical models and experimental studies of human–robot cooperation , 2015, Int. J. Robotics Res..

[42]  Masamichi Shimosaka,et al.  Modeling risk anticipation and defensive driving on residential roads with inverse reinforcement learning , 2014, 17th International IEEE Conference on Intelligent Transportation Systems (ITSC).

[43]  Wolfram Burgard,et al.  Learning driving styles for autonomous vehicles from demonstration , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[44]  Hugh F. Durrant-Whyte,et al.  A solution to the simultaneous localization and map building (SLAM) problem , 2001, IEEE Trans. Robotics Autom..

[45]  Sebastian Thrun,et al.  Towards fully autonomous driving: Systems and algorithms , 2011, 2011 IEEE Intelligent Vehicles Symposium (IV).

[46]  Claire J. Tomlin,et al.  A probabilistic approach to planning and control in autonomous urban driving , 2013, 52nd IEEE Conference on Decision and Control.

[47]  Razvan Pascanu,et al.  Theano: new features and speed improvements , 2012, ArXiv.

[48]  T. Hedden,et al.  What do you think I think you think?: Strategic reasoning in matrix games , 2002, Cognition.

[49]  Sergey Levine,et al.  Continuous Inverse Optimal Control with Locally Optimal Examples , 2012, ICML.

[50]  Jérôme Renault,et al.  Repeated Games with Incomplete Information , 2009, Encyclopedia of Complexity and Systems Science.