AveRobot: An Audio-visual Dataset for People Re-identification and Verification in Human-Robot Interaction

Intelligent technologies have pervaded our daily life, making it easier for people to complete their activities. One emerging application is involving the use of robots for assisting people in various tasks (e.g., visiting a museum). In this context, it is crucial to enable robots to correctly identify people. Existing robots often use facial information to establish the identity of a person of interest. But, the face alone may not offer enough relevant information due to variations in pose, illumination, resolution and recording distance. Other biometric modalities like the voice can improve the recognition performance in these conditions. However, the existing datasets in robotic scenarios usually do not include the audio cue and tend to suffer from one or more limitations: most of them are acquired under controlled conditions, limited in number of identities or samples per user, collected by the same recording device, and/or not freely available. In this paper, we propose AveRobot, an audio-visual dataset of 111 participants vocalizing short sentences under robot assistance scenarios. The collection took place into a three-floor building through eight different cameras with built-in microphones. The performance for face and voice re-identification and verification was evaluated on this dataset with deep learning baselines, and compared against audio-visual datasets from diverse scenarios. The results showed that AveRobot is a challenging dataset for people re-identification and verification.

[1]  Xiaogang Wang,et al.  DeepID3: Face Recognition with Very Deep Neural Networks , 2015, ArXiv.

[2]  Serhan Cosar,et al.  Volume-based Human Re-identification with RGB-D Cameras , 2017, VISIGRAPP.

[3]  Kingshuk Chakravarty,et al.  Person Identification using Skeleton Information from Kinect , 2013, ACHI 2013.

[4]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[5]  Marios Savvides,et al.  Ring Loss: Convex Feature Normalization for Face Recognition , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[6]  Matti Pietikäinen,et al.  Bi-Modal Person Recognition on a Mobile Phone: Using Mobile Phone Data , 2012, 2012 IEEE International Conference on Multimedia and Expo Workshops.

[7]  Luc Van Gool,et al.  One-Shot Person Re-identification with a Consumer Depth Camera , 2014, Person Re-Identification.

[8]  Wolfram Burgard,et al.  TOURBOT and WebFAIR: Web-operated mobile robots for tele-presence in populated exhibitions , 2005, IEEE Robotics & Automation Magazine.

[9]  Michele Nappi,et al.  Fusion of physiological measures for multimodal biometric systems , 2017, Multimedia Tools and Applications.

[10]  Eric Martinson,et al.  Learning speaker recognition models through human-robot interaction , 2011, 2011 IEEE International Conference on Robotics and Automation.

[11]  Ming Yang,et al.  DeepFace: Closing the Gap to Human-Level Performance in Face Verification , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Joon Son Chung,et al.  VoxCeleb: A Large-Scale Speaker Identification Dataset , 2017, INTERSPEECH.

[13]  Oliver Durr,et al.  Speaker identification and clustering using convolutional neural networks , 2016, 2016 IEEE 26th International Workshop on Machine Learning for Signal Processing (MLSP).

[14]  Roberto Saia,et al.  Using neural word embeddings to model user behavior and detect user segments , 2016, Knowl. Based Syst..

[15]  Filip De Turck,et al.  Personalized Guided Tour by Multiple Robots through Semantic Profile Definition and Dynamic Redistribution of Participants , 2012, CogRob@AAAI.

[16]  Maja Pantic,et al.  A real-time and unsupervised face Re-Identification system for Human-Robot Interaction , 2018, Pattern Recognit. Lett..

[17]  Yu Qiao,et al.  Joint Face Detection and Alignment Using Multitask Cascaded Convolutional Networks , 2016, IEEE Signal Processing Letters.

[18]  Eduardo Zalama,et al.  BellBot - A Hotel Assistant System Using Mobile Robots , 2013 .

[19]  Gang Sun,et al.  Squeeze-and-Excitation Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[20]  John H. L. Hansen,et al.  Speaker Recognition by Machines and Humans: A tutorial review , 2015, IEEE Signal Processing Magazine.

[21]  James J. Little,et al.  Charlie Rides the Elevator -- Integrating Vision, Navigation and Manipulation towards Multi-floor Robot Locomotion , 2013, 2013 International Conference on Computer and Robot Vision.

[22]  James Philbin,et al.  FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Arun Ross,et al.  MSU-AVIS dataset: Fusing Face and Voice Modalities for Biometric Recognition in Indoor Surveillance Videos , 2018, 2018 24th International Conference on Pattern Recognition (ICPR).

[24]  Stephanie Rosenthal,et al.  An effective personal mobile robot agent through symbiotic human-robot interaction , 2010, AAMAS.

[25]  François Michaud,et al.  Multimodal biometric identification system for mobile robots combining human metrology to face recognition and speaker identification , 2014, The 23rd IEEE International Symposium on Robot and Human Interactive Communication.

[26]  Takayuki Kanda,et al.  Interactive Humanoid Robots for a Science Museum , 2007, IEEE Intell. Syst..

[27]  Santiago Fernández,et al.  Interacting with a Robot: A Guide Robot Understanding Natural Language Instructions , 2012, UCAmI.

[28]  Gianni Fenu,et al.  Controlling User Access to Cloud-Connected Mobile Applications by Means of Biometrics , 2018, IEEE Cloud Computing.

[29]  Wolfram Burgard,et al.  MINERVA: a second-generation museum tour-guide robot , 1999, Proceedings 1999 IEEE International Conference on Robotics and Automation (Cat. No.99CH36288C).

[30]  Hong Liu,et al.  Online RGB-D person re-identification based on metric model update , 2017, CAAI Trans. Intell. Technol..

[31]  Antonio C. Domínguez-Brito,et al.  Eldi: an agent based museum robot , 2001 .

[32]  Gianni Fenu,et al.  A multi-biometric system for continuous student authentication in e-learning platforms , 2017, Pattern Recognit. Lett..

[33]  Jun Miura,et al.  Identification of a specific person using color, height, and gait features for a person following robot , 2016, Robotics Auton. Syst..

[34]  Grzegorz Cielniak,et al.  Person identification by mobile robots in indoor environments , 2003, 1st International Workshop on Robotic Sensing, 2003. ROSE' 03..

[35]  Jian Cheng,et al.  Additive Margin Softmax for Face Verification , 2018, IEEE Signal Processing Letters.

[36]  Gianni Fenu,et al.  Strategies to Carry and Forward Packets in VANET , 2011, DICTAP.

[37]  Shaogang Gong,et al.  Reidentification by Relative Distance Comparison , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[38]  Sven Behnke,et al.  The humanoid museum tour guide Robotinho , 2009, RO-MAN 2009 - The 18th IEEE International Symposium on Robot and Human Interactive Communication.

[39]  Roberto Saia,et al.  Semantics-aware content-based recommender systems: Design and architecture guidelines , 2017, Neurocomputing.

[40]  Robin De Keyser,et al.  A hierarchical global path planning approach for mobile robots based on multi-objective particle swarm optimization , 2017, Appl. Soft Comput..

[41]  Yu Qiao,et al.  A Discriminative Feature Learning Approach for Deep Face Recognition , 2016, ECCV.

[42]  Sharath Pankanti,et al.  BIOMETRIC IDENTIFICATION , 2000 .

[43]  Javier Ruiz-del-Solar,et al.  Human Detection and Identification by Robots Using Thermal and Visual Information in Domestic Environments , 2011, Journal of Intelligent & Robotic Systems.

[44]  Luis Enrique Sucar,et al.  Real-time face recognition for human-robot interaction , 2008, 2008 8th IEEE International Conference on Automatic Face & Gesture Recognition.

[45]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[46]  Daniel Riccio,et al.  FAME: Face Authentication for Mobile Encounter , 2013, 2013 IEEE Workshop on Biometric Measurements and Systems for Security and Medical Applications.

[47]  Miguel D. Cacho,et al.  GuideBot. A Tour Guide System Based on Mobile Robots , 2013 .