Sense and Learn: Self-Supervision for Omnipresent Sensors

Learning general-purpose representations from multisensor data produced by the omnipresent sensing systems (or IoT in general) has numerous applications in diverse use areas. Existing purely supervised end-to-end deep learning techniques depend on the availability of a massive amount of well-curated data, acquiring which is notoriously difficult but required to achieve a sufficient level of generalization on a task of interest. In this work, we leverage the self-supervised learning paradigm towards realizing the vision of continual learning from unlabeled inputs. We present a generalized framework named Sense and Learn for representation or feature learning from raw sensory data. It consists of eight auxiliary tasks that can learn high-level and broadly useful features entirely from unannotated data without any human involvement in the tedious labeling process. We demonstrate the efficacy of our approach on several publicly available datasets from different domains and in various settings, including linear separability, semi-supervised or few shot learning, and transfer learning. Our methodology achieves results that are competitive with the supervised approaches and close the gap through fine-tuning a network while learning the downstream tasks in most cases. In particular, we show that the self-supervised network can be utilized as initialization to significantly boost the performance in a low-data regime with as few as 5 labeled instances per class, which is of high practical importance to real-world problems. Likewise, the learned representations with self-supervision are found to be highly transferable between related datasets, even when few labeled instances are available from the target domains. The self-learning nature of our methodology opens up exciting possibilities for on-device continual learning.

[1]  Demis Hassabis,et al.  Mastering the game of Go without human knowledge , 2017, Nature.

[2]  Paolo Favaro,et al.  Unsupervised Learning of Visual Representations by Solving Jigsaw Puzzles , 2016, ECCV.

[3]  Vladlen Koltun,et al.  An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling , 2018, ArXiv.

[4]  Weidi Xie,et al.  Self-supervised Video Representation Learning for Correspondence Flow , 2019, British Machine Vision Conference.

[5]  Truyen Tran,et al.  Improving Generalization and Stability of Generative Adversarial Networks , 2019, ICLR.

[6]  Patrick Olivier,et al.  Feature Learning for Activity Recognition in Ubiquitous Computing , 2011, IJCAI.

[7]  Thomas Plötz,et al.  Deep, Convolutional, and Recurrent Models for Human Activity Recognition Using Wearables , 2016, IJCAI.

[8]  Yoshua Bengio,et al.  Learning deep physiological models of affect , 2013, IEEE Computational Intelligence Magazine.

[9]  Yihong Gong,et al.  Tracking Persons-of-Interest via Adaptive Discriminative Features , 2016, ECCV.

[10]  VALENTIN RADU,et al.  Multimodal Deep Learning for Activity and Context Recognition , 2018, Proc. ACM Interact. Mob. Wearable Ubiquitous Technol..

[11]  Nima Mesgarani,et al.  Conv-TasNet: Surpassing Ideal Time–Frequency Magnitude Masking for Speech Separation , 2018, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[12]  Yoshua Bengio,et al.  Extracting and composing robust features with denoising autoencoders , 2008, ICML '08.

[13]  Jeffrey M. Hausdorff,et al.  Physionet: Components of a New Research Resource for Complex Physiologic Signals". Circu-lation Vol , 2000 .

[14]  Chao Wu,et al.  DeepSleepNet: A Model for Automatic Sleep Stage Scoring Based on Raw Single-Channel EEG , 2017, IEEE Transactions on Neural Systems and Rehabilitation Engineering.

[15]  Phillip Isola,et al.  Contrastive Multiview Coding , 2019, ECCV.

[16]  Alexander Kolesnikov,et al.  S4L: Self-Supervised Semi-Supervised Learning , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[17]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[18]  J. Schmidhuber Making the world differentiable: on using self supervised fully recurrent neural networks for dynamic reinforcement learning and planning in non-stationary environments , 1990, Forschungsberichte, TU Munich.

[19]  Davide Anguita,et al.  A Public Domain Dataset for Human Activity Recognition using Smartphones , 2013, ESANN.

[20]  Karim Jerbi,et al.  Learning machines and sleeping brains: Automatic sleep stage classification using decision-tree multi-class support vector machines , 2015, Journal of Neuroscience Methods.

[21]  Hubert Eichner,et al.  Towards Federated Learning at Scale: System Design , 2019, SysML.

[22]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[23]  Gunnar Rätsch,et al.  Real-valued (Medical) Time Series Generation with Recurrent Conditional GANs , 2017, ArXiv.

[24]  Johan Lukkien,et al.  Multi-task Self-Supervised Learning for Human Activity Detection , 2019, Proc. ACM Interact. Mob. Wearable Ubiquitous Technol..

[25]  Efstratios Gavves,et al.  Self-Supervised Video Representation Learning with Odd-One-Out Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Davide Anguita,et al.  Transition-Aware Human Activity Recognition Using Smartphones , 2016, Neurocomputing.

[27]  Andrea Cavallaro,et al.  Protecting Sensory Data against Sensitive Inferences , 2018, P2DS@EuroSys.

[28]  Virginia R. de Sa,et al.  Learning Classification with Unlabeled Data , 1993, NIPS.

[29]  Alexei A. Efros,et al.  Unsupervised Visual Representation Learning by Context Prediction , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[30]  Timo Aila,et al.  Pruning Convolutional Neural Networks for Resource Efficient Inference , 2016, ICLR.

[31]  Marco Tagliasacchi,et al.  Self-supervised audio representation learning for mobile devices , 2019, ArXiv.

[32]  M. Bethge,et al.  Shortcut learning in deep neural networks , 2020, Nature Machine Intelligence.

[33]  Stojan Trajanovski,et al.  Personalized Driver Stress Detection with Multi-task Neural Networks using Physiological Signals , 2017, ArXiv.

[34]  Barry Y. Chen,et al.  Improvements to Context Based Self-Supervised Learning , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[35]  Daniela Micucci,et al.  On the Personalization of Classification Models for Human Activity Recognition , 2020, IEEE Access.

[36]  Aeilko H. Zwinderman,et al.  Analysis of a sleep-dependent neuronal feedback loop: the slow-wave microcontinuity of the EEG , 2000, IEEE Transactions on Biomedical Engineering.

[37]  Gregory Shakhnarovich,et al.  Learning Representations for Automatic Colorization , 2016, ECCV.

[38]  Dawn Song,et al.  Using Self-Supervised Learning Can Improve Model Robustness and Uncertainty , 2019, NeurIPS.

[39]  Lu Su,et al.  SenseGAN , 2018 .

[40]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[41]  Jennifer Healey,et al.  Detecting stress during real-world driving tasks using physiological sensors , 2005, IEEE Transactions on Intelligent Transportation Systems.

[42]  Thomas Plötz,et al.  Using unlabeled data in a sparse-coding framework for human activity recognition , 2014, Pervasive Mob. Comput..

[43]  Tanir Ozcelebi,et al.  Model Adaptation and Personalization for Physiological Stress Detection , 2018, 2018 IEEE 5th International Conference on Data Science and Advanced Analytics (DSAA).

[44]  Alexei A. Efros,et al.  Colorful Image Colorization , 2016, ECCV.

[45]  Aapo Hyvärinen,et al.  Unsupervised Feature Extraction by Time-Contrastive Learning and Nonlinear ICA , 2016, NIPS.

[46]  Nikos Komodakis,et al.  Unsupervised Representation Learning by Predicting Image Rotations , 2018, ICLR.

[47]  Oriol Vinyals,et al.  Representation Learning with Contrastive Predictive Coding , 2018, ArXiv.

[48]  Bo Ding,et al.  Unsupervised Feature Learning for Human Activity Recognition Using Smartphone Sensors , 2014, MIKE.

[49]  Yiqiang Chen,et al.  Cross-position Activity Recognition with Stratified Transfer Learning , 2018, Pervasive Mob. Comput..

[50]  Martin Gjoreski,et al.  Cross-dataset deep transfer learning for activity recognition , 2019, UbiComp/ISWC Adjunct.

[51]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[52]  Jennifer Healey,et al.  Toward Machine Emotional Intelligence: Analysis of Affective Physiological State , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[53]  A. Etemad,et al.  Self-Supervised ECG Representation Learning for Emotion Recognition , 2020, IEEE Transactions on Affective Computing.

[54]  Klemens Böhm,et al.  Incremental Real-Time Personalization in Human Activity Recognition Using Domain Adaptive Batch Normalization , 2020, Proc. ACM Interact. Mob. Wearable Ubiquitous Technol..

[55]  Masoumeh Haghpanahi,et al.  Cardiologist-level arrhythmia detection and classification in ambulatory electrocardiograms using a deep neural network , 2019, Nature Medicine.

[56]  Manolis Tsiknakis,et al.  The MobiFall Dataset: Fall Detection and Classification with a Smartphone , 2014, Int. J. Monit. Surveillance Technol. Res..

[57]  Kemal Polat,et al.  Efficient sleep stage recognition system based on EEG signal using k-means clustering based feature weighting , 2010, Expert Syst. Appl..

[58]  Heikki Mannila,et al.  Time series segmentation for context recognition in mobile devices , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[59]  Shahrokh Valaee,et al.  A Survey on Behavior Recognition Using WiFi Channel State Information , 2017, IEEE Communications Magazine.

[60]  Sergey Levine,et al.  Time-Contrastive Networks: Self-Supervised Learning from Video , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[61]  Pascal Vincent,et al.  Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion , 2010, J. Mach. Learn. Res..

[62]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[63]  Mikkel Baun Kjærgaard,et al.  Smart Devices are Different: Assessing and MitigatingMobile Sensing Heterogeneities for Activity Recognition , 2015, SenSys.

[64]  Ying Zhang,et al.  Multivariate Time Series Imputation with Generative Adversarial Networks , 2018, NeurIPS.

[65]  Hao Xue,et al.  Time Series Change Point Detection with Self-Supervised Contrastive Predictive Coding , 2020, WWW.

[66]  Neil Zeghidour,et al.  Wavesplit: End-to-End Speech Separation by Speaker Clustering , 2020, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[67]  Lorenzo Torresani,et al.  Cooperative Learning of Audio and Video Models from Self-Supervised Synchronization , 2018, NeurIPS.

[68]  Blaise Agüera y Arcas,et al.  Communication-Efficient Learning of Deep Networks from Decentralized Data , 2016, AISTATS.