Towards a Robust Interactive and Learning Social Robot

Pepper is a humanoid robot, specifically designed for social interaction, that has been deployed in a variety of public environments. A programmable version of Pepper is also available, enabling our focused research on perception and behavior robustness and capabilities of an interactive social robot. We address Pepper perception by integrating state-of-the-art vision and speech recognition systems and experimentally analyzing their effectiveness. As we recognize limitations of the individual perceptual modalities, we introduce a multi-modality approach to increase the robustness of human social interaction with the robot. We combine vision, gesture, speech, and input from an onboard tablet, a remote mobile phone, and external microphones. Our approach includes the proactive seeking of input from a different modality, adding robustness to the failures of the separate components. We also introduce a learning algorithm to improve communication capabilities over time, updating speech recognition through social interactions. Finally, we realize the rich robot body-sensory data and introduce both a nearest-neighbor and a deep learning approach to enable Pepper to classify and speak up a variety of its own body motions. We view the contributions of our work to be relevant both to Pepper specifically and to other general social robots.

[1]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[2]  Manuela M. Veloso,et al.  Learning environmental knowledge from task-based human-robot dialog , 2013, 2013 IEEE International Conference on Robotics and Automation.

[3]  Ali Farhadi,et al.  YOLO9000: Better, Faster, Stronger , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Dragos Datcu,et al.  AIBO ROBOT AS A SOCCER AND RESCUE GAME PLAYER , .

[6]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[8]  Jörg Stückler,et al.  NimbRo@Home: Winning Team of the RoboCup@Home Competition 2012 , 2012, RoboCup.

[9]  Manuela M. Veloso,et al.  A Team of Humanoid Game Commentators , 2006, 2006 6th IEEE-RAS International Conference on Humanoid Robots.

[10]  Manuela M. Veloso,et al.  Learning to understand questions on the task history of a service robot , 2017, 2017 26th IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN).

[11]  Abhinav Gupta,et al.  The Curious Robot: Learning Visual Representations via Physical Interactions , 2016, ECCV.

[12]  Stephanie Rosenthal,et al.  Dynamic generation and refinement of robot verbalization , 2016, 2016 25th IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN).

[13]  Stephanie Rosenthal,et al.  Verbalization: Narration of Autonomous Robot Experience , 2016, IJCAI.

[14]  Michael Sipser,et al.  Introduction to the Theory of Computation , 1996, SIGA.

[15]  Stephanie Rosenthal,et al.  CoBots: Robust Symbiotic Autonomous Mobile Service Robots , 2015, IJCAI.

[16]  Manuela M. Veloso,et al.  Action Selection via Learning Behavior Patterns in Multi-Robot Systems , 2011, IJCAI.

[17]  Sharon L. Oviatt,et al.  Mutual disambiguation of recognition errors in a multimodel architecture , 1999, CHI '99.

[18]  Stefan Holzer,et al.  Towards autonomous robotic butlers: Lessons learned with the PR2 , 2011, 2011 IEEE International Conference on Robotics and Automation.

[19]  Yaser Sheikh,et al.  OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.