Who Goes There? Exploiting Silhouettes and Wearable Signals for Subject Identification in Multi-Person Environments

The re-identification of people in private environments is a rather complicated task, not only from a technical standpoint but also for the ethical issues connected to it. The lack of a privacy-sensitive technology to monitor specific individuals prevents the uptake of assistive systems, for example in Ambient Assisted Living and health monitoring applications. Our approach adopts a deep learning multimodal framework to match silhouette video clips and accelerometer signals to identify and re-identify the subjects of interest within a multi-person environment. Brief sequences, which may be as short as only 3 seconds, are encoded within a latent space where simple Euclidean distance can be used to discriminate the matching. Identities are only revealed in terms of accelerometer carriers, and the use of silhouettes instead of RGB signals helps to ring-fence privacy concerns. We train our method on the SPHERE Calorie Dataset, for which we show an average area under the ROC curve of 76.3%. We also propose a novel triplet loss for which we demonstrate improving performances and convergence speeds.

[1]  Nasser Kehtarnavaz,et al.  A survey of depth and inertial sensor fusion for human action recognition , 2015, Multimedia Tools and Applications.

[2]  Tara N. Sainath,et al.  Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups , 2012, IEEE Signal Processing Magazine.

[3]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[4]  Wolfgang L. Zagler,et al.  Ambient Assisted Living Systems - The Conflicts between Technology, Acceptance, Ethics and Privacy , 2007, Assisted Living Systems - Models, Architectures and Engineering Approaches.

[5]  Niall Twomey,et al.  Bridging e-Health and the Internet of Things: The SPHERE Project , 2015, IEEE Intelligent Systems.

[6]  Roger Lagadec,et al.  A 2-channel, 16-bit digital sampling frequency converter for professional digital audio , 1982, ICASSP.

[7]  Majid Mirmehdi,et al.  Sit-to-Stand Analysis in the Wild Using Silhouettes for Longitudinal Health Monitoring , 2019, ICIAR.

[8]  Hayley Hung,et al.  Who is where?: Matching People in Video to Wearable Acceleration During Crowded Mingling Events , 2016, ACM Multimedia.

[9]  Koichi Hashimoto,et al.  Identifying a moving object with an accelerometer in a camera view , 2008, 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[10]  Hrvoje Benko,et al.  CrossMotion: Fusing Device and Image Motion for User Identification, Tracking and Device Association , 2014, ICMI.

[11]  F. Seco,et al.  A comparison of Pedestrian Dead-Reckoning algorithms using a low-cost MEMS IMU , 2009, 2009 IEEE International Symposium on Intelligent Signal Processing.

[12]  Majid Mirmehdi,et al.  Calorie Counter: RGB-Depth Visual Estimation of Energy Expenditure at Home , 2016, ACCV Workshops.

[13]  Michael R Whitehouse,et al.  Using home sensing technology to assess outcome and recovery after hip and knee replacement in the UK: the HEmiSPHERE study protocol , 2018, BMJ Open.

[14]  Nasser M. Nasrabadi,et al.  Text-Independent Speaker Verification Using 3D Convolutional Neural Networks , 2017, 2018 IEEE International Conference on Multimedia and Expo (ICME).

[15]  Majid Mirmehdi,et al.  CaloriNet: From silhouettes to calorie estimation in private environments , 2018, BMVC.

[16]  Mahsan Rofouei,et al.  Your phone or mine?: fusing body, touch and device sensing for multi-user device-display interaction , 2012, CHI.

[17]  Thambipillai Srikanthan,et al.  Vision-based patient monitoring: a comprehensive review of algorithms and technologies , 2018, J. Ambient Intell. Humaniz. Comput..

[18]  Andreas Savvides,et al.  Tasking networked CCTV cameras and mobile phones to identify and localize multiple people , 2010, UbiComp.

[19]  Martina Ziefle,et al.  Medical Technology in Smart Homes: Exploring the User's Perspective on Privacy, Intimacy and Trust , 2011, 2011 IEEE 35th Annual Computer Software and Applications Conference Workshops.

[20]  Joon Son Chung,et al.  Learning to lip read words by watching videos , 2018, Comput. Vis. Image Underst..

[21]  Zhaozheng Yin,et al.  Combining passive visual cameras and active IMU sensors for persistent pedestrian tracking , 2017, J. Vis. Commun. Image Represent..

[22]  Peter A. Flach,et al.  Smart homes, private homes? An empirical study of technology researchers’ perceptions of ethical issues in developing smart-home health technologies , 2017, BMC medical ethics.

[23]  James Philbin,et al.  FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Bodo Rosenhahn,et al.  Simultaneous Identification and Tracking of Multiple People Using Video and IMUs , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[25]  Yaser Sheikh,et al.  OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Hervé Bredin,et al.  TristouNet: Triplet loss for speaker turn embedding , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[27]  Laura Cabrera-Quiros,et al.  A Hierarchical Approach for Associating Body-Worn Sensors to Video Regions in Crowded Mingling Scenarios , 2019, IEEE Transactions on Multimedia.