Semi-Supervised Distillation: Personalizing Deep Neural Networks in Activity Recognition using Inertial Sensors

Personalization of activity recognition has become a topic of interest to improve recognition performance for diverse users. Recent researches show that deep neural networks improve generalization performance in activity recognition using inertial sensors, such as accelerometers and gyroscopes; however, personalizing deep neural networks is challenging because it has a thousands or millions of parameters but generally personalization should be done with small amount of labeled data. This paper proposes novel way to personalize deep neural networks by preventing overfitting using un-labeled data. This is done by adding output-distribution similarity regularization between the reference model and personalized models, which is an extension of distillation recently proposed by Hinton. Experiments on an opportunity activity recognition dataset, one of the most famous datasets in the fields, demonstrates that the proposed regularization techniques prevent overfitting even if we have few labeled data for each target classes per users, and provide better recognition performances compared with other personalization techniques. We also conduct various experiments, including a no-labeled data setting and combination of the proposed method and well-used personalization techniques to check if the proposed method is complementary with existing methods or competitor of its. The results suggest that the proposed regularization works well in various settings, and complementary with existing methods.

[1]  Steve Renals,et al.  Learning hidden unit contributions for unsupervised speaker adaptation of neural network acoustic models , 2014, 2014 IEEE Spoken Language Technology Workshop (SLT).

[2]  S. M. Siniscalchi,et al.  Hermitian Polynomial for Speaker Adaptation of Connectionist Speech Recognition Systems , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[3]  Steve Renals,et al.  Differentiable pooling for unsupervised speaker adaptation , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[4]  Xiaoli Li,et al.  Deep Convolutional Neural Networks on Multichannel Time Series for Human Activity Recognition , 2015, IJCAI.

[5]  Yoshua Bengio,et al.  FitNets: Hints for Thin Deep Nets , 2014, ICLR.

[6]  Gary M. Weiss,et al.  The Impact of Personalization on Smartphone-Based Activity Recognition , 2012, AAAI 2012.

[7]  Yoshua Bengio,et al.  Neural Networks with Few Multiplications , 2015, ICLR.

[8]  Patrick Olivier,et al.  Feature Learning for Activity Recognition in Ubiquitous Computing , 2011, IJCAI.

[9]  Kaisheng Yao,et al.  KL-divergence regularized deep neural network adaptation for improved large vocabulary speech recognition , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[10]  Paul Lukowicz,et al.  Collecting complex activity datasets in highly rich networked sensor environments , 2010, 2010 Seventh International Conference on Networked Sensing Systems (INSS).

[11]  Alexander Zien,et al.  Semi-Supervised Classification by Low Density Separation , 2005, AISTATS.

[12]  Song Han,et al.  Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.

[13]  Christian Vollmer,et al.  Learning Features for Activity Recognition with Shift-Invariant Sparse Coding , 2013, ICANN.

[14]  Bernhard Schölkopf,et al.  Unifying distillation and privileged information , 2015, ICLR.

[15]  Yifan Gong,et al.  Singular value decomposition based low-footprint speaker adaptation and personalization for deep neural network , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[16]  Yoshua Bengio,et al.  Semi-supervised Learning by Entropy Minimization , 2004, CAP.

[17]  Kaisheng Yao,et al.  Adaptation of context-dependent deep neural networks for automatic speech recognition , 2012, 2012 IEEE Spoken Language Technology Workshop (SLT).

[18]  Pascal Vincent,et al.  The Manifold Tangent Classifier , 2011, NIPS.

[19]  Didier Stricker,et al.  Personalized mobile physical activity recognition , 2013, ISWC '13.

[20]  Patrick Kenny,et al.  Front-End Factor Analysis for Speaker Verification , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[21]  Ricardo Chavarriaga,et al.  Benchmarking classification techniques using the Opportunity human activity dataset , 2011, 2011 IEEE International Conference on Systems, Man, and Cybernetics.

[22]  George Saon,et al.  Speaker adaptation of neural network acoustic models using i-vectors , 2013, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding.

[23]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[24]  Ye Xu,et al.  Enabling large-scale human activity inference on smartphones using community similarity networks (csn) , 2011, UbiComp '11.

[25]  Hossein Mobahi,et al.  Deep Learning via Semi-supervised Embedding , 2012, Neural Networks: Tricks of the Trade.

[26]  Mi Zhang,et al.  USC-HAD: a daily activity dataset for ubiquitous activity recognition using wearable sensors , 2012, UbiComp.

[27]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[28]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[29]  Yongqiang Wang,et al.  Adaptation of deep neural network acoustic models using factorised i-vectors , 2014, INTERSPEECH.