Feature learning for Human Activity Recognition using Convolutional Neural Networks

The use of Convolutional Neural Networks (CNNs) as a feature learning method for Human Activity Recognition (HAR) is becoming more and more common. Unlike conventional machine learning methods, which require domain-specific expertise, CNNs can extract features automatically. On the other hand, CNNs require a training phase, making them prone to the cold-start problem. In this work, a case study is presented where the use of a pre-trained CNN feature extractor is evaluated under realistic conditions. The case study consists of two main steps: (1) different topologies and parameters are assessed to identify the best candidate models for HAR, thus obtaining a pre-trained CNN model. The pre-trained model (2) is then employed as feature extractor evaluating its use with a large scale real-world dataset. Two CNN applications were considered: Inertial Measurement Unit (IMU) and audio based HAR. For the IMU data, balanced accuracy was 91.98% on the UCI-HAR dataset, and 67.51% on the real-world Extrasensory dataset. For the audio data, the balanced accuracy was 92.30% on the DCASE 2017 dataset, and 35.24% on the Extrasensory dataset.

[1]  Nadir Weibel,et al.  Context Recognition In-the-Wild , 2018, Proc. ACM Interact. Mob. Wearable Ubiquitous Technol..

[2]  Gerald Penn,et al.  Convolutional Neural Networks for Speech Recognition , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[3]  Tuomas Virtanen,et al.  End-to-End Polyphonic Sound Event Detection Using Convolutional Recurrent Neural Networks with Learned Time-Frequency Representation Input , 2018, 2018 International Joint Conference on Neural Networks (IJCNN).

[4]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[5]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[6]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[7]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[8]  Chris D. Nugent,et al.  A Public Domain Dataset for Human Activity Recognition in Free-Living Conditions , 2019, 2019 IEEE SmartWorld, Ubiquitous Intelligence & Computing, Advanced & Trusted Computing, Scalable Computing & Communications, Cloud & Big Data Computing, Internet of People and Smart City Innovation (SmartWorld/SCALCOM/UIC/ATC/CBDCom/IOP/SCI).

[9]  Martín Abadi,et al.  TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.

[10]  Dan Stowell,et al.  Deep Learning for Audio Event Detection and Tagging on Low-Resource Datasets , 2018, Applied Sciences.

[11]  Katarzyna Radecka,et al.  A Comprehensive Analysis on Wearable Acceleration Sensors in Human Activity Recognition , 2017, Sensors.

[12]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[13]  George R. Thoma,et al.  Pre-trained convolutional neural networks as feature extractors toward improved malaria parasite detection in thin blood smear images , 2018, PeerJ.

[14]  Mark D. Plumbley,et al.  Multi-Resolution Fully Convolutional Neural Networks for Monaural Audio Source Separation , 2018, LVA/ICA.

[15]  Zhi-Hua Zhou,et al.  Fast Multi-Instance Multi-Label Learning , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Cem Ersoy,et al.  A Review and Taxonomy of Activity Recognition on Mobile Phones , 2013 .

[17]  Kimiaki Shirahama,et al.  Comparison of Feature Learning Methods for Human Activity Recognition Using Wearable Sensors , 2018, Sensors.

[18]  Dimitrios Tzovaras,et al.  Comparing CNN and Human Crafted Features for Human Activity Recognition , 2019, 2019 IEEE SmartWorld, Ubiquitous Intelligence & Computing, Advanced & Trusted Computing, Scalable Computing & Communications, Cloud & Big Data Computing, Internet of People and Smart City Innovation (SmartWorld/SCALCOM/UIC/ATC/CBDCom/IOP/SCI).

[19]  Sung-Bae Cho,et al.  Human activity recognition with smartphone sensors using deep learning neural networks , 2016, Expert Syst. Appl..

[20]  Gernot A. Fink,et al.  Convolutional Neural Networks for Human Activity Recognition Using Body-Worn Sensors , 2018, Informatics.

[21]  Alejandro Baldominos Gómez,et al.  A Comparison of Machine Learning and Deep Learning Techniques for Activity Recognition using Mobile Devices , 2019, Sensors.

[22]  Francesc Alías,et al.  homeSound: Real-Time Audio Event Detection Based on High Performance Computing for Behaviour and Surveillance Remote Monitoring , 2017, Sensors.

[23]  DeLiang Wang,et al.  Analyzing noise robustness of MFCC and GFCC features in speaker identification , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[24]  Chris D. Nugent,et al.  Human Activity Recognition from the Acceleration Data of a Wearable Device. Which Features Are More Relevant by Activities? , 2018, UCAmI.

[25]  Fotis Foukalas,et al.  Wireless Communication Technologies for Safe Cooperative Cyber Physical Systems , 2018, Sensors.

[26]  Roberto Togneri,et al.  Random forest classification based acoustic event detection utilizing contextual-information and bottleneck features , 2018, Pattern Recognit..

[27]  Richard Socher,et al.  Improving Generalization Performance by Switching from Adam to SGD , 2017, ArXiv.

[28]  Vesa T. Peltonen,et al.  Audio-based context recognition , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[29]  Bernt Schiele,et al.  A tutorial on human activity recognition using body-worn inertial sensors , 2014, CSUR.

[30]  Gert R. G. Lanckriet,et al.  Recognizing Detailed Human Context in the Wild from Smartphones and Smartwatches , 2016, IEEE Pervasive Computing.

[31]  J. Riekki,et al.  Auditory Context Recognition Using SVMs , 2008, 2008 The Second International Conference on Mobile Ubiquitous Computing, Systems, Services and Technologies.

[32]  Francesc Alías,et al.  Gammatone Cepstral Coefficients: Biologically Inspired Features for Non-Speech Audio Classification , 2012, IEEE Transactions on Multimedia.

[33]  Davide Anguita,et al.  A Public Domain Dataset for Human Activity Recognition using Smartphones , 2013, ESANN.

[34]  Daniel Roggen,et al.  Deep Convolutional and LSTM Recurrent Neural Networks for Multimodal Wearable Activity Recognition , 2016, Sensors.

[35]  Vesa T. Peltonen,et al.  Computational auditory scene recognition , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[36]  Aren Jansen,et al.  Audio Set: An ontology and human-labeled dataset for audio events , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[37]  Davide Anguita,et al.  Transition-Aware Human Activity Recognition Using Smartphones , 2016, Neurocomputing.

[38]  Jafet Morales,et al.  Physical activity recognition by smartphones, a survey , 2017 .

[39]  Ankit Shah,et al.  DCASE2017 Challenge Setup: Tasks, Datasets and Baseline System , 2017, DCASE.

[40]  Tanir Ozcelebi,et al.  Learning behavioral context recognition with multi-stream temporal convolutional networks , 2018, ArXiv.