论文信息 - Improving Robot Controller Transparency Through Autonomous Policy Explanation

Improving Robot Controller Transparency Through Autonomous Policy Explanation

Shared expectations and mutual understanding are critical facets of teamwork. Achieving these in human-robot collaborative contexts can be especially challenging, as humans and robots are unlikely to share a common language to convey intentions, plans, or justifications. Even in cases where human co-workers can inspect a robot's control code, and particularly when statistical methods are used to encode control policies, there is no guarantee that meaningful insights into a robot's behavior can be derived or that a human will be able to efficiently isolate the behaviors relevant to the interaction. We present a series of algorithms and an accompanying system that enables robots to autonomously synthesize policy descriptions and respond to both general and targeted queries by human collaborators. We demonstrate applicability to a variety of robot controller types including those that utilize conditional logic, tabular reinforcement learning, and deep reinforcement learning, synthesizing informative policy descriptions for collaborators and facilitating fault diagnosis by non-experts.

Bradley Hayes | Julie A. Shah | J. Shah | Bradley Hayes

[1] Nripendra N. Biswas,et al. Minimization of Boolean Functions , 1971, IEEE Transactions on Computers.

[2] Richard Fikes,et al. STRIPS: A New Approach to the Application of Theorem Proving to Problem Solving , 1971, IJCAI.

[3] Richard S. Sutton,et al. Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[4] Iris Vessey,et al. Expertise in Debugging Computer Programs: A Process Analysis , 1984, Int. J. Man Mach. Stud..

[5] Johanna D. Moore,et al. Explanation in second generation expert systems , 1993 .

[6] Olivier Coudert,et al. Two-level logic minimization: an overview , 1994, Integr..

[7] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[8] L. Richard Ye,et al. The Impact of Explanation Facilities in User Acceptance of Expert System Advice , 1995, MIS Q..

[9] Andrew W. Moore,et al. Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[10] Raja Parasuraman,et al. Humans and Automation: Use, Misuse, Disuse, Abuse , 1997, Hum. Factors.

[11] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .

[12] Giuliano Antoniol,et al. Recovering Traceability Links between Code and Documentation , 2002, IEEE Trans. Software Eng..

[13] Sriram K. Rajamani,et al. The SLAM project: debugging system software via static analysis , 2002, POPL '02.

[14] Brent Hailpern,et al. Software debugging, testing, and verification , 2002, IBM Syst. J..

[15] Leslie Pack Kaelbling,et al. Effective reinforcement learning for mobile robots , 2002, Proceedings 2002 IEEE International Conference on Robotics and Automation (Cat. No.02CH37292).

[16] Mayur Naik,et al. From symptom to cause: localizing errors in counterexample traces , 2003, POPL '03.

[17] Steven P. Reiss,et al. Fault localization with nearest neighbor queries , 2003, 18th IEEE International Conference on Automated Software Engineering, 2003. Proceedings..

[18] John D. Lee,et al. Trust in Automation: Designing for Appropriate Reliance , 2004, Hum. Factors.

[19] Pieter Abbeel,et al. Apprenticeship learning via inverse reinforcement learning , 2004, ICML.

[20] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[21] Erika A. Waters,et al. Formats for Improving Risk Communication in Medical Tradeoff Decisions , 2006, Journal of health communication.

[22] Pamela J. Hinds,et al. Autonomy and Common Ground in Human-Robot Interaction: A Field Study , 2007, IEEE Intelligent Systems.

[23] Arun Kumar Misra,et al. Optimization of the Quine-McCluskey Method for the Minimization of the Boolean Expressions , 2008, Fourth International Conference on Autonomic and Autonomous Systems (ICAS'08).

[24] Daniel Kroening,et al. A Survey of Automated Techniques for Formal Software Verification , 2008, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[25] Sue Fitzgerald,et al. Debugging: finding, fixing and flailing, a multi-institutional study of novice debuggers , 2008, Comput. Sci. Educ..

[26] F. Elizalde,et al. Policy Explanation in Factored Markov Decision Processes , 2008 .

[27] Xavier Leroy,et al. Formal verification of a realistic compiler , 2009, CACM.

[28] Vivianne H M Visschers,et al. Probability Information in Risk Communication: A Review of the Research Literature , 2009, Risk analysis : an official publication of the Society for Risk Analysis.

[29] Pascal Poupart,et al. Minimal Sufficient Explanations for Factored Markov Decision Processes , 2009, ICAPS.

[30] Alessandro Orso,et al. Are automated debugging techniques actually helping programmers? , 2011, ISSTA '11.

[31] Stefanos Nikolaidis,et al. Human-robot cross-training: Computational formulation, modeling and evaluation of a human team training strategy , 2013, 2013 8th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[32] Jan Peters,et al. Reinforcement learning in robotics: A survey , 2013, Int. J. Robotics Res..

[33] B. Scassellati,et al. Challenges in Shared-Environment Human-Robot Collaboration , 2013 .

[34] Ross A. Knepper,et al. Asking for Help Using Inverse Semantics , 2014, Robotics: Science and Systems.

[35] Maja J. Mataric,et al. How Robot Verbal Feedback Can Improve Team Performance in Human-Robot Task Collaborations , 2015, 2015 10th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[36] Sonia Chernova,et al. Interactive Hierarchical Task Learning from a Single Demonstration , 2015, 2015 10th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[37] Hadas Kress-Gazit,et al. Let's talk: Autonomous conflict resolution for robots carrying out individual high-level tasks in a shared workspace , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[38] Brian Scassellati,et al. Effective robot teammate behaviors for supporting sequential manipulation tasks , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[39] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.

[40] Jacob W. Crandall,et al. Learning to Interact with a Human Partner , 2015, 2015 10th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[41] Brian Scassellati,et al. Autonomously constructing hierarchical task networks for planning and human-robot collaboration , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[42] Maya Cakmak,et al. Situated Language Understanding with Human-like and Visualization-Based Transparency , 2016, Robotics: Science and Systems.

[43] Carlos Guestrin,et al. "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[44] Ning Wang,et al. The Impact of POMDP-Generated Explanations on Trust and Performance in Human-Robot Teams , 2016, AAMAS.

[45] Minoru Asada,et al. Initiative in robot assistance during collaborative task execution , 2016, 2016 11th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[46] Rachid Alami,et al. An implemented theory of mind to improve human-robot shared plans execution , 2016, 2016 11th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[47] Wojciech Zaremba,et al. OpenAI Gym , 2016, ArXiv.

[48] Zachary Chase Lipton. The mythos of model interpretability , 2016, ACM Queue.