Enhancing Robot Learning with Human Social Cues

Imagine a learning scenario between two humans: a teacher demonstrating how to play a new musical instrument or a craftsman teaching a new skill like pottery or knitting to a novice. Even though learning a skill has a learning curve to get the nuances of the technique right, some basic social principles are followed between the teacher and the student to make the learning process eventually succeed. There are several assumptions or social priors in this communication for teaching: mutual eye contact to draw attention to instructions, following the gaze of the teacher to understand the skill, the teacher following the student's gaze during imitation to give feedback, the teacher demonstrating by pointing towards something she is going to approach or manipulate and verbal interruptions or corrections during the learning process [1], [2]. In prior research, verbal and non-verbal social cues such as eye gaze and gestures have been shown to make human-human interactions seamless and augment verbal, collaborative behavior [3], [4]. They serve as an indicator of engagement, interest and attention when people interact face-to-face with one another [5], [6].

[1]  Rodina Binti Ahmad,et al.  A systematic literature review on vision based gesture recognition techniques , 2018, Multimedia Tools and Applications.

[2]  Hideaki Kuzuoka,et al.  Museum guide robot based on sociological interaction analysis , 2007, CHI.

[3]  Siddhartha S. Srinivasa,et al.  Eye-Hand Behavior in Human-Robot Shared Manipulation , 2018, 2018 13th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[4]  Andrea Lockerd Thomaz,et al.  Human Gaze Following for Human-Robot Interaction , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[5]  J. Wertsch,et al.  The creation of context in joint problem-solving. , 1984 .

[6]  Scott Niekum,et al.  Understanding Teacher Gaze Patterns for Robot Learning , 2019, CoRL.

[7]  Kris M. Kitani,et al.  Hand parsing for fine-grained recognition of human grasps in monocular images , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[8]  Katharina J. Rohlfing,et al.  Toward designing a robot that learns actions from parental demonstrations , 2008, 2008 IEEE International Conference on Robotics and Automation.

[9]  M. Argyle,et al.  Gaze and Mutual Gaze , 1994, British Journal of Psychiatry.

[10]  Tommy Strandvall,et al.  Eye Tracking in Human-Computer Interaction and Usability Research , 2009, INTERACT.

[11]  Antonio Torralba,et al.  Where are they looking? , 2015, NIPS.

[12]  Alejandro Bordallo,et al.  Physical symbol grounding and instance learning through demonstration and eye tracking , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[13]  Eyal Amir,et al.  Bayesian Inverse Reinforcement Learning , 2007, IJCAI.

[14]  Antonio Torralba,et al.  Following Gaze in Video , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[15]  Candace L. Sidner,et al.  Recognizing engagement in human-robot interaction , 2010, HRI 2010.

[16]  Peter Robinson,et al.  OpenFace: An open source facial behavior analysis toolkit , 2016, 2016 IEEE Winter Conference on Applications of Computer Vision (WACV).

[17]  Yaser Sheikh,et al.  OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  M L Abercrombie,et al.  Non-verbal communication. , 1972, Proceedings of the Royal Society of Medicine.

[19]  B. Scassellati,et al.  Social eye gaze in human-robot interaction , 2017, HRI 2017.

[20]  Pierre Dillenbourg,et al.  From real-time attention assessment to “with-me-ness” in human-robot interaction , 2016, 2016 11th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[21]  Sergey Levine,et al.  Guided Cost Learning: Deep Inverse Optimal Control via Policy Optimization , 2016, ICML.

[22]  Raj M. Ratwani,et al.  Integrating vision and audition within a cognitive architecture to track conversations , 2008, 2008 3rd ACM/IEEE International Conference on Human-Robot Interaction (HRI).