The Impact of Human–Robot Interfaces on the Learning of Visual Objects

This paper studies the impact of interfaces, allowing nonexpert users to efficiently and intuitively teach a robot to recognize new visual objects. We present challenges that need to be addressed for real-world deployment of robots capable of learning new visual objects in interaction with everyday users. We argue that in addition to robust machine learning and computer vision methods, well-designed interfaces are crucial for learning efficiency. In particular, we argue that interfaces can be key in helping nonexpert users to collect good learning examples and, thus, improve the performance of the overall learning system. Then, we present four alternative human–robot interfaces: Three are based on the use of a mediating artifact (smartphone, wiimote, wiimote and laser), and one is based on natural human gestures (with a Wizard-of-Oz recognition system). These interfaces mainly vary in the kind of feedback provided to the user, allowing him to understand more or less easily what the robot is perceiving and, thus, guide his way of providing training examples differently. We then evaluate the impact of these interfaces, in terms of learning efficiency, usability, and user’s experience, through a real world and large-scale user study. In this experiment, we asked participants to teach a robot 12 different new visual objects in the context of a robotic game. This game happens in a home-like environment and was designed to motivate and engage users in an interaction where using the system was meaningful. We then discuss results that show significant differences among interfaces. In particular, we show that interfaces such as the smartphone interface allows nonexpert users to intuitively provide much better training examples to the robot, which is almost as good as expert users who are trained for this task and are aware of the different visual perception and machine learning issues. We also show that artifact-mediated teaching is significantly more efficient for robot learning, and equally good in terms of usability and user’s experience, than teaching thanks to a gesture-based human-like interaction.

[1]  Pierre-Yves Oudeyer,et al.  Exploring the use of a handheld device in language teaching human-robot interaction , 2009 .

[2]  Dana H. Ballard,et al.  A multimodal learning interface for grounding spoken language in sensory perceptions , 2004, ACM Trans. Appl. Percept..

[3]  Pierre-Yves Oudeyer,et al.  An integrated system for teaching new visually grounded words to a robot for non-expert users using a mobile device , 2009, 2009 9th IEEE-RAS International Conference on Humanoid Robots.

[4]  Cordelia Schmid,et al.  Dataset Issues in Object Recognition , 2006, Toward Category-Level Object Recognition.

[5]  Cordelia Schmid,et al.  Spatial pyramid matching , 2009 .

[6]  Manfred Tscheligi,et al.  The USUS Evaluation Framework for Human-Robot Interaction , 2009 .

[7]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[8]  M. Tomasello,et al.  Understanding and sharing intentions: The origins of cultural cognition , 2005, Behavioral and Brain Sciences.

[9]  F. Kaplan,et al.  The challenges of joint attention , 2006 .

[10]  Kentaro Ishii,et al.  Designing Laser Gesture Interface for Robot Control , 2009, INTERACT.

[11]  P. H. Miller,et al.  Theories of developmental psychology, 2nd ed. , 1989 .

[12]  Deb Roy,et al.  Grounded spoken language acquisition: experiments in word learning , 2003, IEEE Trans. Multim..

[13]  Seung-Hwan Choi,et al.  Incorporation of a Software Robot anda Mobile Robot Using a Middle Layer , 2007, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[14]  David Filliat,et al.  A visual bag of words method for interactive qualitative localization and mapping , 2007, Proceedings 2007 IEEE International Conference on Robotics and Automation.

[15]  Frank Lömker,et al.  A Multimodal System for Object Learning , 2002, DAGM-Symposium.

[16]  Hande Kaymaz-Keskinpala,et al.  PDA-based human-robotic interface , 2003, SMC'03 Conference Proceedings. 2003 IEEE International Conference on Systems, Man and Cybernetics. Conference Theme - System Security and Assurance (Cat. No.03CH37483).

[17]  Katherine M. Tsui,et al.  Development and evaluation of a flexible interface for a wheelchair mounted robotic arm , 2008, 2008 3rd ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[18]  Aude Billard,et al.  What is the Teacher"s Role in Robot Programming by Demonstration? - Toward Benchmarks for Improved Learning , 2007 .

[19]  Luc Steels,et al.  Aibo''s first words. the social learning of language and meaning. Evolution of Communication , 2002 .

[20]  Pierre-Yves Oudeyer,et al.  A comparison of three interfaces using handheld devices to intuitively drive and show objects to a social robot: the impact of underlying metaphors , 2009, RO-MAN 2009 - The 18th IEEE International Symposium on Robot and Human Interactive Communication.

[21]  Magdalena D. Bugajska,et al.  Building a Multimodal Human-Robot Interface , 2001, IEEE Intell. Syst..

[22]  Dieter Schmalstieg,et al.  Using transparent props for interaction with the virtual table , 1999, SI3D.

[23]  Sebastian Lang,et al.  BIRON - The Bielefeld Robot Companion , 2004 .

[24]  Max Lungarella,et al.  Developmental Robotics , 2009, Encyclopedia of Artificial Intelligence.

[25]  Pierre-Yves Oudeyer,et al.  Guest Editorial Active Learning and Intrinsically Motivated Exploration in Robots: Advances and Challenges , 2010, IEEE Trans. Auton. Ment. Dev..

[26]  Frédéric Kaplan,et al.  Talking AIBO : First Experimentation of Verbal Interactions with an Autonomous Four-legged Robot , 2000 .

[27]  Jean-Arcady Meyer,et al.  Real-time visual loop-closure detection , 2008, 2008 IEEE International Conference on Robotics and Automation.

[28]  Luc Van Gool,et al.  Speeded-Up Robust Features (SURF) , 2008, Comput. Vis. Image Underst..

[29]  Pierre-Yves Oudeyer,et al.  A study of three interfaces allowing non-expert users to teach new visual objects to a robot and their impact on learning efficiency , 2010, 2010 5th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[30]  Kerstin Dautenhahn,et al.  New Frontiers in Human-Robot Interaction , 2011 .

[31]  P. H. Miller Theories of developmental psychology , 1983 .

[32]  Martin Hachet,et al.  Navidget for Easy 3D Camera Positioning from 2D Inputs , 2008, 2008 IEEE Symposium on 3D User Interfaces.

[33]  Frédéric Kaplan,et al.  Les machines apprivoisées : comprendre les robots de loisir , 2005 .

[34]  David Filliat Interactive learning of visual topological navigation , 2008, 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[35]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[36]  Zhe Xu,et al.  A point-and-click interface for the real world: Laser designation of objects for mobile manipulation , 2008, 2008 3rd ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[37]  David Nistér,et al.  Scalable Recognition with a Vocabulary Tree , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[38]  Dhiraj Joshi,et al.  Object Categorization: Computer and Human Vision Perspectives , 2008 .

[39]  Pierre-Yves Oudeyer,et al.  A robotic game to evaluate interfaces used to show and teach visual objects to a robot in real world condition , 2011, 2011 6th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[40]  Brian Scassellati Mechanisms of Shared Attention for a Humanoid Robot , 1998 .

[41]  W. Gander,et al.  Least-squares fitting of circles and ellipses , 1994 .

[42]  David R. Michael,et al.  Serious Games: Games That Educate, Train, and Inform , 2005 .

[43]  Stefan Schaal,et al.  Robot Programming by Demonstration , 2009, Springer Handbook of Robotics.

[44]  Ehud Sharlin,et al.  Playing Games with Robots - A Method for Evaluating Human-Robot Interaction , 2007 .

[45]  Andrea Lockerd Thomaz,et al.  Teachable robots: Understanding human teaching behavior to build more effective robot learners , 2008, Artif. Intell..

[46]  Pierre-Yves Oudeyer,et al.  Using mediator objects to easily and robustly teach visual objects to a robot , 2010, SIGGRAPH '10.

[47]  C. Breazeal,et al.  Robot learning via socially guided exploration , 2007, 2007 IEEE 6th International Conference on Development and Learning.

[48]  Heiko Wersing,et al.  A Biologically Motivated System for Unconstrained Online Learning of Visual Objects , 2006, ICANN.

[49]  K. Dautenhahn,et al.  Imitation and Social Learning in Robots, Humans and Animals: Behavioural, Social and Communicative Dimensions , 2009 .

[50]  Bill Gates,et al.  A robot in every home. , 2007 .

[51]  Active Learning and Intrinsically Motivated Exploration in Robots : Advances and Challenges , 2010 .

[52]  Cordelia Schmid,et al.  A Performance Evaluation of Local Descriptors , 2005, IEEE Trans. Pattern Anal. Mach. Intell..

[53]  L. Steels,et al.  coordinating perceptually grounded categories through language: a case study for colour , 2005, Behavioral and Brain Sciences.

[54]  H. Wallace,et al.  New Frontiers , 1934 .

[55]  Bill Gates A robot in every home. (cover story) , 2007 .

[56]  Illah R. Nourbakhsh,et al.  A survey of socially interactive robots , 2003, Robotics Auton. Syst..

[57]  Andrea Lockerd Thomaz,et al.  Robot's play: interactive games with sociable machines , 2004, CIE.

[58]  Alex Pentland,et al.  Learning words from sights and sounds: a computational model , 2002, Cogn. Sci..

[59]  Jean Scholtz,et al.  Beyond usability evaluation: analysis of human-robot interaction at a major robotics competition , 2004 .

[60]  Aude Billard,et al.  On Learning, Representing, and Generalizing a Task in a Humanoid Robot , 2007, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[61]  Kerstin Dautenhahn,et al.  Practical and methodological challenges in designing and conducting human-robot interaction studies , 2005 .

[62]  Charles E. Thorpe,et al.  PdaDriver: A Handheld System for Remote Driving , 2003 .

[63]  Takeo Igarashi,et al.  Sketch and run: a stroke-based interface for home robots , 2009, CHI.

[64]  Hongbin Zha,et al.  Vision-based Global Localization Using a Visual Vocabulary , 2005, Proceedings of the 2005 IEEE International Conference on Robotics and Automation.

[65]  Daisuke Sakamoto,et al.  CRISTAL: a collaborative home media and device controller based on a multi-touch display , 2009, ITS '09.