Improving Robot Controller Transparency Through Autonomous Policy Explanation

Shared expectations and mutual understanding are critical facets of teamwork. Achieving these in human-robot collaborative contexts can be especially challenging, as humans and robots are unlikely to share a common language to convey intentions, plans, or justifications. Even in cases where human co-workers can inspect a robot's control code, and particularly when statistical methods are used to encode control policies, there is no guarantee that meaningful insights into a robot's behavior can be derived or that a human will be able to efficiently isolate the behaviors relevant to the interaction. We present a series of algorithms and an accompanying system that enables robots to autonomously synthesize policy descriptions and respond to both general and targeted queries by human collaborators. We demonstrate applicability to a variety of robot controller types including those that utilize conditional logic, tabular reinforcement learning, and deep reinforcement learning, synthesizing informative policy descriptions for collaborators and facilitating fault diagnosis by non-experts.

[1]  Nripendra N. Biswas,et al.  Minimization of Boolean Functions , 1971, IEEE Transactions on Computers.

[2]  Richard Fikes,et al.  STRIPS: A New Approach to the Application of Theorem Proving to Problem Solving , 1971, IJCAI.

[3]  Richard S. Sutton,et al.  Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[4]  Iris Vessey,et al.  Expertise in Debugging Computer Programs: A Process Analysis , 1984, Int. J. Man Mach. Stud..

[5]  Johanna D. Moore,et al.  Explanation in second generation expert systems , 1993 .

[6]  Olivier Coudert,et al.  Two-level logic minimization: an overview , 1994, Integr..

[7]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[8]  L. Richard Ye,et al.  The Impact of Explanation Facilities in User Acceptance of Expert System Advice , 1995, MIS Q..

[9]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[10]  Raja Parasuraman,et al.  Humans and Automation: Use, Misuse, Disuse, Abuse , 1997, Hum. Factors.

[11]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[12]  Giuliano Antoniol,et al.  Recovering Traceability Links between Code and Documentation , 2002, IEEE Trans. Software Eng..

[13]  Sriram K. Rajamani,et al.  The SLAM project: debugging system software via static analysis , 2002, POPL '02.

[14]  Brent Hailpern,et al.  Software debugging, testing, and verification , 2002, IBM Syst. J..

[15]  Leslie Pack Kaelbling,et al.  Effective reinforcement learning for mobile robots , 2002, Proceedings 2002 IEEE International Conference on Robotics and Automation (Cat. No.02CH37292).

[16]  Mayur Naik,et al.  From symptom to cause: localizing errors in counterexample traces , 2003, POPL '03.

[17]  Steven P. Reiss,et al.  Fault localization with nearest neighbor queries , 2003, 18th IEEE International Conference on Automated Software Engineering, 2003. Proceedings..

[18]  John D. Lee,et al.  Trust in Automation: Designing for Appropriate Reliance , 2004, Hum. Factors.

[19]  Pieter Abbeel,et al.  Apprenticeship learning via inverse reinforcement learning , 2004, ICML.

[20]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[21]  Erika A. Waters,et al.  Formats for Improving Risk Communication in Medical Tradeoff Decisions , 2006, Journal of health communication.

[22]  Pamela J. Hinds,et al.  Autonomy and Common Ground in Human-Robot Interaction: A Field Study , 2007, IEEE Intelligent Systems.

[23]  Arun Kumar Misra,et al.  Optimization of the Quine-McCluskey Method for the Minimization of the Boolean Expressions , 2008, Fourth International Conference on Autonomic and Autonomous Systems (ICAS'08).

[24]  Daniel Kroening,et al.  A Survey of Automated Techniques for Formal Software Verification , 2008, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[25]  Sue Fitzgerald,et al.  Debugging: finding, fixing and flailing, a multi-institutional study of novice debuggers , 2008, Comput. Sci. Educ..

[26]  F. Elizalde,et al.  Policy Explanation in Factored Markov Decision Processes , 2008 .

[27]  Xavier Leroy,et al.  Formal verification of a realistic compiler , 2009, CACM.

[28]  Vivianne H M Visschers,et al.  Probability Information in Risk Communication: A Review of the Research Literature , 2009, Risk analysis : an official publication of the Society for Risk Analysis.

[29]  Pascal Poupart,et al.  Minimal Sufficient Explanations for Factored Markov Decision Processes , 2009, ICAPS.

[30]  Alessandro Orso,et al.  Are automated debugging techniques actually helping programmers? , 2011, ISSTA '11.

[31]  Stefanos Nikolaidis,et al.  Human-robot cross-training: Computational formulation, modeling and evaluation of a human team training strategy , 2013, 2013 8th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[32]  Jan Peters,et al.  Reinforcement learning in robotics: A survey , 2013, Int. J. Robotics Res..

[33]  B. Scassellati,et al.  Challenges in Shared-Environment Human-Robot Collaboration , 2013 .

[34]  Ross A. Knepper,et al.  Asking for Help Using Inverse Semantics , 2014, Robotics: Science and Systems.

[35]  Maja J. Mataric,et al.  How Robot Verbal Feedback Can Improve Team Performance in Human-Robot Task Collaborations , 2015, 2015 10th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[36]  Sonia Chernova,et al.  Interactive Hierarchical Task Learning from a Single Demonstration , 2015, 2015 10th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[37]  Hadas Kress-Gazit,et al.  Let's talk: Autonomous conflict resolution for robots carrying out individual high-level tasks in a shared workspace , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[38]  Brian Scassellati,et al.  Effective robot teammate behaviors for supporting sequential manipulation tasks , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[39]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[40]  Jacob W. Crandall,et al.  Learning to Interact with a Human Partner , 2015, 2015 10th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[41]  Brian Scassellati,et al.  Autonomously constructing hierarchical task networks for planning and human-robot collaboration , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[42]  Maya Cakmak,et al.  Situated Language Understanding with Human-like and Visualization-Based Transparency , 2016, Robotics: Science and Systems.

[43]  Carlos Guestrin,et al.  "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[44]  Ning Wang,et al.  The Impact of POMDP-Generated Explanations on Trust and Performance in Human-Robot Teams , 2016, AAMAS.

[45]  Minoru Asada,et al.  Initiative in robot assistance during collaborative task execution , 2016, 2016 11th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[46]  Rachid Alami,et al.  An implemented theory of mind to improve human-robot shared plans execution , 2016, 2016 11th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[47]  Wojciech Zaremba,et al.  OpenAI Gym , 2016, ArXiv.

[48]  Zachary Chase Lipton The mythos of model interpretability , 2016, ACM Queue.