论文信息 - AIR-Act2Act: Human-human interaction dataset for teaching non-verbal social behaviors to robots

AIR-Act2Act: Human-human interaction dataset for teaching non-verbal social behaviors to robots

To better interact with users, a social robot should understand the users' behavior, infer the intention, and respond appropriately. Machine learning is one way of implementing robot intelligence. It provides the ability to automatically learn and improve from experience instead of explicitly telling the robot what to do. Social skills can also be learned through watching human-human interaction videos. However, human-human interaction datasets are relatively scarce to learn interactions that occur in various situations. Moreover, we aim to use service robots in the elderly-care domain; however, there has been no interaction dataset collected for this domain. For this reason, we introduce a human-human interaction dataset for teaching non-verbal social behaviors to robots. It is the only interaction dataset that elderly people have participated in as performers. We recruited 100 elderly people and two college students to perform 10 interactions in an indoor environment. The entire dataset has 5,000 interaction samples, each of which contains depth maps, body indexes and 3D skeletal data that are captured with three Microsoft Kinect v2 cameras. In addition, we provide the joint angles of a humanoid NAO robot which are converted from the human behavior that robots need to learn. The dataset and useful python scripts are available for download at this https URL. It can be used to not only teach social skills to robots but also benchmark action recognition algorithms.

Minsu Jang | Woo-Ri Ko | Jaeyeon Lee | Jaehong Kim

[1] Remco C. Veltkamp,et al. Spatio-Temporal Detection of Fine-Grained Dyadic Human Interactions , 2016, HBU.

[2] Gang Wang,et al. NTU RGB+D: A Large Scale Dataset for 3D Human Activity Analysis , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3] Bilge Mutlu,et al. Robot behavior toolkit: Generating effective social behaviors for robots , 2012, 2012 7th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[4] Ehud Sharlin,et al. Designing social greetings in human robot interaction , 2014, Conference on Designing Interactive Systems.

[5] Fabio Viola,et al. The Kinetics Human Action Video Dataset , 2017, ArXiv.

[6] Ian D. Reid,et al. High Five: Recognising human interactions in TV shows , 2010, BMVC.

[7] Yuichiro Yoshikawa,et al. Robot gains social intelligence through multimodal deep reinforcement learning , 2016, 2016 IEEE-RAS 16th International Conference on Humanoid Robots (Humanoids).

[8] Cordelia Schmid,et al. Actions in context , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[9] Wei Guo,et al. Efficient Interaction Recognition through Positive Action Representation , 2013 .

[10] S. Kopp,et al. Towards Adaptive Social Behavior Generation for Assistive Robots Using Reinforcement Learning , 2017, 2017 12th ACM/IEEE International Conference on Human-Robot Interaction (HRI.

[11] Lourdes Agapito,et al. Lifting from the Deep: Convolutional 3D Pose Estimation from a Single Image , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12] Andrew W. Fitzgibbon,et al. Real-time human pose recognition in parts from single depth images , 2011, CVPR 2011.

[13] Dimitris Samaras,et al. Two-person interaction detection using body-pose features and multiple instance learning , 2012, 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[14] Jake K. Aggarwal,et al. View invariant human action recognition using histograms of 3D joints , 2012, 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[15] BlakeAndrew,et al. Real-time human pose recognition in parts from single depth images , 2013 .

[16] Jake K. Aggarwal,et al. Robot-Centric Activity Prediction from First-Person Videos: What Will They Do to Me? , 2015, 2015 10th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[17] Gang Wang,et al. NTU RGB+D 120: A Large-Scale Benchmark for 3D Human Activity Understanding , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.