Comparing CNN and Human Crafted Features for Human Activity Recognition

Deep learning techniques such as Convolutional Neural Networks (CNNs) have shown good results in activity recognition. One of the advantages of using these methods resides in their ability to generate features automatically. This ability greatly simplifies the task of feature extraction that usually requires domain specific knowledge, especially when using big data where data driven approaches can lead to anti-patterns. Despite the advantage of this approach, very little work has been undertaken on analyzing the quality of extracted features, and more specifically on how model architecture and parameters affect the ability of those features to separate activity classes in the final feature space. This work focuses on identifying the optimal parameters for recognition of simple activities applying this approach on both signals from inertial and audio sensors. The paper provides the following contributions: (i) a comparison of automatically extracted CNN features with gold standard Human Crafted Features (HCF) is given, (ii) a comprehensive analysis on how architecture and model parameters affect separation of target classes in the feature space. Results are evaluated using publicly available datasets. In particular, we achieved a 93.38% F-Score on the UCI-HAR dataset, using 1D CNNs with 3 convolutional layers and 32 kernel size, and a 90.5% F-Score on the DCASE 2017 development dataset, simplified for three classes (indoor, outdoor and vehicle), using 2D CNNs with 2 convolutional layers and a 2x2 kernel size.

[1]  Aren Jansen,et al.  CNN architectures for large-scale audio classification , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[2]  Mark D. Plumbley,et al.  Multi-Resolution Fully Convolutional Neural Networks for Monaural Audio Source Separation , 2018, LVA/ICA.

[3]  Zhi-Hua Zhou,et al.  Fast Multi-Instance Multi-Label Learning , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Diane J. Cook,et al.  Simple and Complex Activity Recognition through Smart Phones , 2012, 2012 Eighth International Conference on Intelligent Environments.

[5]  Bernt Schiele,et al.  A tutorial on human activity recognition using body-worn inertial sensors , 2014, CSUR.

[6]  Gerald Penn,et al.  Convolutional Neural Networks for Speech Recognition , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[7]  Vesa T. Peltonen,et al.  Audio-based context recognition , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[8]  Vesa T. Peltonen,et al.  Computational auditory scene recognition , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[9]  Jafet Morales,et al.  Physical activity recognition by smartphones, a survey , 2017 .

[10]  Ankit Shah,et al.  DCASE2017 Challenge Setup: Tasks, Datasets and Baseline System , 2017, DCASE.

[11]  Chris D. Nugent,et al.  Human Activity Recognition from the Acceleration Data of a Wearable Device. Which Features Are More Relevant by Activities? , 2018, UCAmI.

[12]  Zhenghua Chen,et al.  A Novel Semisupervised Deep Learning Method for Human Activity Recognition , 2019, IEEE Transactions on Industrial Informatics.

[13]  Dan Stowell,et al.  Deep Learning for Audio Event Detection and Tagging on Low-Resource Datasets , 2018, Applied Sciences.

[14]  Kimiaki Shirahama,et al.  Comparison of Feature Learning Methods for Human Activity Recognition Using Wearable Sensors , 2018, Sensors.

[15]  Katarzyna Radecka,et al.  A Comprehensive Analysis on Wearable Acceleration Sensors in Human Activity Recognition , 2017, Sensors.

[16]  Sung-Bae Cho,et al.  Human activity recognition with smartphone sensors using deep learning neural networks , 2016, Expert Syst. Appl..

[17]  Davide Anguita,et al.  A Public Domain Dataset for Human Activity Recognition using Smartphones , 2013, ESANN.

[18]  Daniel Roggen,et al.  Deep Convolutional and LSTM Recurrent Neural Networks for Multimodal Wearable Activity Recognition , 2016, Sensors.

[19]  Rossitza Goleva,et al.  Improving Activity Recognition Accuracy in Ambient-Assisted Living Systems by Automated Feature Engineering , 2017, IEEE Access.

[20]  Alex Mihailidis,et al.  Ambient Assisted Living Technologies for Aging Well: A Scoping Review , 2016, J. Intell. Syst..

[21]  Ahmad Almogren,et al.  A robust human activity recognition system using smartphone sensors and deep learning , 2018, Future Gener. Comput. Syst..

[22]  J. Riekki,et al.  Auditory Context Recognition Using SVMs , 2008, 2008 The Second International Conference on Mobile Ubiquitous Computing, Systems, Services and Technologies.

[23]  Francesc Alías,et al.  Gammatone Cepstral Coefficients: Biologically Inspired Features for Non-Speech Audio Classification , 2012, IEEE Transactions on Multimedia.

[24]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[25]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[26]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.