Vision2Sensor

Mobile and wearable sensing devices are pervasive, coming packed with a growing number of sensors. These are supposed to provide direct observations about user activity and context to intelligent systems, and are envisioned to be at the core of smart buildings, towards habitat automation to suit user needs. However, much of this enormous sensing capability is currently wasted, instead of being tapped into, because developing context recognition systems requires substantial amount of labeled sensor data to train models on. Sensor data is hard to interpret and annotate after collection, making it difficult and costly to generate large training sets, which is now stalling the adoption of mobile sensing at scale. We address this fundamental problem in the ubicomp community (not having enough training data) by proposing a knowledge transfer framework, Vision2Sensor, which opportunistically transfers information from an easy to interpret and more advanced sensing modality, vision, to other sensors on mobile devices. Activities recognised by computer vision in the camera field of view are synchronized with inertial sensor data to produce labels, which are then used to dynamically update a mobile sensor based recognition model. We show that transfer learning is also beneficial to identifying the best Convolutional Neural Network for vision based human activity recognition for our task. The performance of a proposed network is first evaluated on a larger dataset, followed by transferring the pre-trained model to be fine-tuned on our five class activity recognition task. Our sensor based Deep Neural Network is robust to withstand substantial degradation of label quality, dropping just 3% in accuracy on induced degradation of 15% to vision generated labels. This indicates that knowledge transfer between sensing modalities is achievable even with significant noise introduced by the labeling modality. Our system operates in real-time on embedded computing devices, ensuring user data privacy by performing all the computations in the local network.

[1]  Fei-Fei Li,et al.  Large-Scale Video Classification with Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[2]  Lorenzo Torresani,et al.  Learning Spatiotemporal Features with 3D Convolutional Networks , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[3]  Mubarak Shah,et al.  UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild , 2012, ArXiv.

[4]  Kilian Q. Weinberger,et al.  On Calibration of Modern Neural Networks , 2017, ICML.

[5]  Gregory D. Abowd,et al.  Automatic Synchronization of Wearable Sensors and Video-Cameras for Ground Truth Annotation -- A Practical Approach , 2012, 2012 16th International Symposium on Wearable Computers.

[6]  Agathoniki Trigoni,et al.  Robust pedestrian dead reckoning (R-PDR) for arbitrary mobile device placement , 2014, 2014 International Conference on Indoor Positioning and Indoor Navigation (IPIN).

[7]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[8]  Mahesh K. Marina,et al.  Towards multimodal deep learning for activity recognition on mobile devices , 2016, UbiComp Adjunct.

[9]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Alan F. Murray,et al.  Confidence estimation methods for neural networks : a practical comparison , 2001, ESANN.

[11]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[12]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[13]  Xiaoli Li,et al.  Deep Convolutional Neural Networks on Multichannel Time Series for Human Activity Recognition , 2015, IJCAI.

[14]  Miguel A. Labrador,et al.  Centinela: A human activity recognition system based on acceleration and vital sign data , 2012, Pervasive Mob. Comput..

[15]  R. Venkatesh Babu,et al.  Confidence estimation in Deep Neural networks via density modelling , 2017, ArXiv.

[16]  VALENTIN RADU,et al.  Multimodal Deep Learning for Activity and Context Recognition , 2018, Proc. ACM Interact. Mob. Wearable Ubiquitous Technol..

[17]  Alois Ferscha,et al.  Real-Time Transfer and Evaluation of Activity Recognition Capabilities in an Opportunistic System , 2011 .

[18]  Romit Roy Choudhury,et al.  SurroundSense: mobile phone localization via ambience fingerprinting , 2009, MobiCom '09.

[19]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[20]  Paramvir Bahl,et al.  RADAR: an in-building RF-based user location and tracking system , 2000, Proceedings IEEE INFOCOM 2000. Conference on Computer Communications. Nineteenth Annual Joint Conference of the IEEE Computer and Communications Societies (Cat. No.00CH37064).

[21]  Bo Chen,et al.  MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[22]  Christian Montag,et al.  Menthal: a framework for mobile data collection and analysis , 2016, UbiComp Adjunct.

[23]  Mahesh K. Marina,et al.  HiMLoc: Indoor smartphone localization via activity aware Pedestrian Dead Reckoning with selective crowdsourced WiFi fingerprinting , 2013, International Conference on Indoor Positioning and Indoor Navigation.

[24]  Ivan Laptev,et al.  Learning and Transferring Mid-level Image Representations Using Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[25]  Zoubin Ghahramani,et al.  Deep Bayesian Active Learning with Image Data , 2017, ICML.

[26]  Ihsan Ullah,et al.  Survey on Deep Learning Techniques for Person Re-Identification Task , 2018, ArXiv.

[27]  Agathoniki Trigoni,et al.  Lightweight map matching for indoor localisation using conditional random fields , 2014, IPSN-14 Proceedings of the 13th International Symposium on Information Processing in Sensor Networks.

[28]  Diane J. Cook,et al.  Transfer learning for activity recognition: a survey , 2013, Knowledge and Information Systems.

[29]  Paul J. M. Havinga,et al.  A Survey of Online Activity Recognition Using Mobile Phones , 2015, Sensors.

[30]  Mikkel Baun Kjærgaard,et al.  Smart Devices are Different: Assessing and MitigatingMobile Sensing Heterogeneities for Activity Recognition , 2015, SenSys.

[31]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[32]  Wenyuan Xu,et al.  AccelPrint: Imperfections of Accelerometers Make Smartphones Trackable , 2014, NDSS.

[33]  Dina Katabi,et al.  Enabling Identification and Behavioral Sensing in Homes using Radio Reflections , 2019, CHI.

[34]  Andrew Zisserman,et al.  Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[36]  Thomas Plötz,et al.  Ensembles of Deep LSTM Learners for Activity Recognition using Wearables , 2017, Proc. ACM Interact. Mob. Wearable Ubiquitous Technol..

[37]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[38]  Miguel A. Labrador,et al.  A Survey on Human Activity Recognition using Wearable Sensors , 2013, IEEE Communications Surveys & Tutorials.

[39]  Ming Yang,et al.  3D Convolutional Neural Networks for Human Action Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[40]  Sunav Choudhary,et al.  An LSTM Based System for Prediction of Human Activities with Durations , 2018, Proc. ACM Interact. Mob. Wearable Ubiquitous Technol..

[41]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[42]  R. Venkatesh Babu,et al.  Estimating Confidence for Deep Neural Networks through Density modeling , 2018, 2018 International Conference on Signal Processing and Communications (SPCOM).

[43]  Kilian Q. Weinberger,et al.  Snapshot Ensembles: Train 1, get M for free , 2017, ICLR.

[44]  Faicel Chamroukhi,et al.  Physical Human Activity Recognition Using Wearable Sensors , 2015, Sensors.

[45]  Sung-Bae Cho,et al.  Human activity recognition with smartphone sensors using deep learning neural networks , 2016, Expert Syst. Appl..

[46]  Petia Radeva,et al.  Human Activity Recognition from Accelerometer Data Using a Wearable Device , 2011, IbPRIA.

[47]  Thomas Plötz,et al.  Deep, Convolutional, and Recurrent Models for Human Activity Recognition Using Wearables , 2016, IJCAI.

[48]  Yoshua Bengio,et al.  How transferable are features in deep neural networks? , 2014, NIPS.

[49]  Oscar Mayora-Ibarra,et al.  Multi-modal mobile sensing of social interactions , 2012, 2012 6th International Conference on Pervasive Computing Technologies for Healthcare (PervasiveHealth) and Workshops.

[50]  Venkata N. Padmanabhan,et al.  Indoor localization without the pain , 2010, MobiCom.