Person Re-ID by Fusion of Video Silhouettes and Wearable Signals for Home Monitoring Applications

The use of visual sensors for monitoring people in their living environments is critical in processing more accurate health measurements, but their use is undermined by the issue of privacy. Silhouettes, generated from RGB video, can help towards alleviating the issue of privacy to some considerable degree. However, the use of silhouettes would make it rather complex to discriminate between different subjects, preventing a subject-tailored analysis of the data within a free-living, multi-occupancy home. This limitation can be overcome with a strategic fusion of sensors that involves wearable accelerometer devices, which can be used in conjunction with the silhouette video data, to match video clips to a specific patient being monitored. The proposed method simultaneously solves the problem of Person ReID using silhouettes and enables home monitoring systems to employ sensor fusion techniques for data analysis. We develop a multimodal deep-learning detection framework that maps short video clips and accelerations into a latent space where the Euclidean distance can be measured to match video and acceleration streams. We train our method on the SPHERE Calorie Dataset, for which we show an average area under the ROC curve of 76.3% and an assignment accuracy of 77.4%. In addition, we propose a novel triplet loss for which we demonstrate improving performances and convergence speed.

[1]  Thambipillai Srikanthan,et al.  Vision-based patient monitoring: a comprehensive review of algorithms and technologies , 2018, J. Ambient Intell. Humaniz. Comput..

[2]  Francisco Javier Ferrández Pastor,et al.  A vision based proposal for classification of normal and abnormal gait using RGB camera , 2016, J. Biomed. Informatics.

[3]  Andreas Savvides,et al.  Tasking networked CCTV cameras and mobile phones to identify and localize multiple people , 2010, UbiComp.

[4]  James Philbin,et al.  FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Qiang Wu,et al.  Long-Term Person Re-identification Using True Motion from Videos , 2018, 2018 IEEE Winter Conference on Applications of Computer Vision (WACV).

[6]  Xiao-Ping Zhang,et al.  Deep learning-based methods for person re-identification: A comprehensive review , 2019, Neurocomputing.

[7]  Majid Mirmehdi,et al.  Sit-to-Stand Analysis in the Wild Using Silhouettes for Longitudinal Health Monitoring , 2019, ICIAR.

[8]  Majid Mirmehdi,et al.  Calorie Counter: RGB-Depth Visual Estimation of Energy Expenditure at Home , 2016, ACCV Workshops.

[9]  Majid Mirmehdi,et al.  Designing a video monitoring system for AAL applications: the SPHERE case study , 2016 .

[10]  Yanning Zhang,et al.  Person Re-Identification in Aerial Imagery , 2019, IEEE Transactions on Multimedia.

[11]  George Loizou,et al.  Computer vision and pattern recognition , 2007, Int. J. Comput. Math..

[12]  Mario Sznaier,et al.  Person Re-identification in Appearance Impaired Scenarios , 2016, BMVC.

[13]  M Donn,et al.  Listen and learn. , 1990, Health Service Journal.

[14]  Alexandre Bernardino,et al.  Gait-based Person Re-identification , 2019, ACM Comput. Surv..

[15]  Martina Ziefle,et al.  Medical Technology in Smart Homes: Exploring the User's Perspective on Privacy, Intimacy and Trust , 2011, 2011 IEEE 35th Annual Computer Software and Applications Conference Workshops.

[16]  Niall Twomey,et al.  Bridging e-Health and the Internet of Things: The SPHERE Project , 2015, IEEE Intelligent Systems.

[17]  Joon Son Chung,et al.  Learning to lip read words by watching videos , 2018, Comput. Vis. Image Underst..

[18]  Yaser Sheikh,et al.  OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Luc Van Gool,et al.  One-Shot Person Re-identification with a Consumer Depth Camera , 2014, Person Re-Identification.

[20]  Majid Mirmehdi,et al.  A Dataset for Persistent Multi-target Multi-camera Tracking in RGB-D , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[21]  Ekin Gedik,et al.  The MatchNMingle Dataset: A Novel Multi-Sensor Resource for the Analysis of Social Interactions and Group Dynamics In-the-Wild During Free-Standing Conversations and Speed Dates , 2018, IEEE Transactions on Affective Computing.

[22]  Hervé Bredin,et al.  TristouNet: Triplet loss for speaker turn embedding , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[23]  F. Seco,et al.  A comparison of Pedestrian Dead-Reckoning algorithms using a low-cost MEMS IMU , 2009, 2009 IEEE International Symposium on Intelligent Signal Processing.

[24]  Shishir K. Shah,et al.  A survey of approaches and trends in person re-identification , 2014, Image Vis. Comput..

[25]  Tieniu Tan,et al.  Silhouette Analysis-Based Gait Recognition for Human Identification , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[26]  Shin'ichi Satoh,et al.  Illumination-Adaptive Person Re-identification , 2019, IEEE Transactions on Multimedia.

[27]  Laura Cabrera-Quiros,et al.  A Hierarchical Approach for Associating Body-Worn Sensors to Video Regions in Crowded Mingling Scenarios , 2019, IEEE Transactions on Multimedia.

[28]  Hayley Hung,et al.  Who is where?: Matching People in Video to Wearable Acceleration During Crowded Mingling Events , 2016, ACM Multimedia.

[29]  Mahsan Rofouei,et al.  Your phone or mine?: fusing body, touch and device sensing for multi-user device-display interaction , 2012, CHI.

[30]  Koichi Hashimoto,et al.  Identifying a moving object with an accelerometer in a camera view , 2008, 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[31]  Michael R Whitehouse,et al.  Using home sensing technology to assess outcome and recovery after hip and knee replacement in the UK: the HEmiSPHERE study protocol , 2018, BMJ Open.

[32]  Majid Mirmehdi,et al.  Who Goes There? Exploiting Silhouettes and Wearable Signals for Subject Identification in Multi-Person Environments , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[33]  Qaiser Riaz,et al.  Person Re-Identification Using Deep Modeling of Temporally Correlated Inertial Motion Patterns , 2020, Sensors.

[34]  Nasser M. Nasrabadi,et al.  Text-Independent Speaker Verification Using 3D Convolutional Neural Networks , 2017, 2018 IEEE International Conference on Multimedia and Expo (ICME).

[35]  Zhaozheng Yin,et al.  Combining passive visual cameras and active IMU sensors for persistent pedestrian tracking , 2017, J. Vis. Commun. Image Represent..

[36]  Gérard Chollet,et al.  Audiovisual Speech Synchrony Measure: Application to Biometrics , 2007, EURASIP J. Adv. Signal Process..

[37]  Xinkai Wu,et al.  A Dynamic Part-Attention Model for Person Re-Identification , 2019, Sensors.

[38]  Sara Colantonio,et al.  Computer Vision for Ambient Assisted Living , 2018 .

[39]  Peter A. Flach,et al.  Smart homes, private homes? An empirical study of technology researchers’ perceptions of ethical issues in developing smart-home health technologies , 2017, BMC medical ethics.

[40]  Haibo Wang,et al.  Silhouette Orientation Volumes for Efficient Fall Detection in Depth Videos , 2017, IEEE Journal of Biomedical and Health Informatics.

[41]  Tara N. Sainath,et al.  Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups , 2012, IEEE Signal Processing Magazine.

[42]  Bodo Rosenhahn,et al.  Simultaneous Identification and Tracking of Multiple People Using Video and IMUs , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[43]  Rytis Maskeliunas,et al.  A Review of Internet of Things Technologies for Ambient Assisted Living Environments , 2019, Future Internet.

[44]  Lorenzo Torresani,et al.  Cooperative Learning of Audio and Video Models from Self-Supervised Synchronization , 2018, NeurIPS.

[45]  Majid Mirmehdi,et al.  CaloriNet: From silhouettes to calorie estimation in private environments , 2018, BMVC.

[46]  Alexandros André Chaaraoui,et al.  A review on vision techniques applied to Human Behaviour Analysis for Ambient-Assisted Living , 2012, Expert Syst. Appl..

[47]  Nasser Kehtarnavaz,et al.  A survey of depth and inertial sensor fusion for human action recognition , 2015, Multimedia Tools and Applications.

[48]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[49]  Wolfgang L. Zagler,et al.  Ambient Assisted Living Systems - The Conflicts between Technology, Acceptance, Ethics and Privacy , 2007, Assisted Living Systems - Models, Architectures and Engineering Approaches.

[50]  Hrvoje Benko,et al.  CrossMotion: Fusing Device and Image Motion for User Identification, Tracking and Device Association , 2014, ICMI.

[51]  Arkadiusz Stopczynski,et al.  Ava Active Speaker: An Audio-Visual Dataset for Active Speaker Detection , 2019, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[52]  Roger Lagadec,et al.  A 2-channel, 16-bit digital sampling frequency converter for professional digital audio , 1982, ICASSP.

[53]  Andrew Zisserman,et al.  Look, Listen and Learn , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).