A Mobile Robot Generating Video Summaries of Seniors' Indoor Activities

We develop a system which generates summaries from seniors' indoor-activity videos captured by a social robot to help remote family members know their seniors' daily activities at home. Unlike the traditional video summarization datasets, indoor videos captured from a moving robot poses additional challenges, namely, (i) the video sequences are very long (ii) a significant number of videoframes contain no-subject or with subjects at ill-posed locations and scales (iii) most of the well-posed frames contain highly redundant information. To address this problem, we propose to exploit pose estimation for detecting people in frames. This guides the robot to follow the user and capture effective videos. We use person identification to distinguish a target senior from other people. We also make use of action recognition to analyze seniors' major activities at different moments, and develop a video summarization method to select diverse and representative keyframes as summaries.

[1]  Yaser Sheikh,et al.  OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Forrest N. Iandola,et al.  SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <1MB model size , 2016, ArXiv.

[3]  Jonas Beskow,et al.  Reverse Engineering Psychologically Valid Facial Expressions of Emotion into Social Robots , 2018, 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018).

[4]  Jesús Chamorro-Martínez,et al.  Diatom autofocusing in brightfield microscopy: a comparative study , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[5]  Somaya Ben Allouch,et al.  Acceptance and use of a social robot by elderly users in a domestic environment , 2010, 2010 4th International Conference on Pervasive Computing Technologies for Healthcare.

[6]  Ke Zhang,et al.  Video Summarization with Long Short-Term Memory , 2016, ECCV.

[7]  Arnaldo de Albuquerque Araújo,et al.  VSUMM: A mechanism designed to produce static video summaries and a novel evaluation method , 2011, Pattern Recognit. Lett..

[8]  Ali Farhadi,et al.  Hollywood in Homes: Crowdsourcing Data Collection for Activity Understanding , 2016, ECCV.

[9]  Kaiyang Zhou,et al.  Deep Reinforcement Learning for Unsupervised Video Summarization with Diversity-Representativeness Reward , 2017, AAAI.

[10]  Zheng-Hua Tan,et al.  iSocioBot: A Multimodal Interactive Social Robot , 2018, Int. J. Soc. Robotics.

[11]  Kristen Grauman,et al.  Diverse Sequential Subset Selection for Supervised Video Summarization , 2014, NIPS.

[12]  M. Saquib Sarfraz,et al.  A Pose-Sensitive Embedding for Person Re-identification with Expanded Cross Neighborhood Re-ranking , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.