Deep Convolutional and LSTM Recurrent Neural Networks for Multimodal Wearable Activity Recognition

Human activity recognition (HAR) tasks have traditionally been solved using engineered features obtained by heuristic processes. Current research suggests that deep convolutional neural networks are suited to automate feature extraction from raw sensor inputs. However, human activities are made of complex sequences of motor movements, and capturing this temporal dynamics is fundamental for successful HAR. Based on the recent success of recurrent neural networks for time series domains, we propose a generic deep framework for activity recognition based on convolutional and LSTM recurrent units, which: (i) is suitable for multimodal wearable sensors; (ii) can perform sensor fusion naturally; (iii) does not require expert knowledge in designing features; and (iv) explicitly models the temporal dynamics of feature activations. We evaluate our framework on two datasets, one of which has been used in a public activity recognition challenge. Our results show that our framework outperforms competing deep non-recurrent networks on the challenge dataset by 4% on average; outperforming some of the previous reported results by up to 9%. Our results show that the framework can be applied to homogeneous sensor modalities, but can also fuse multimodal sensors to improve performance. We characterise key architectural hyperparameters’ influence on performance to provide insights about their optimisation.

[1]  Diane J. Cook,et al.  Keeping the Resident in the Loop: Adapting the Smart Home to the User , 2009, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[2]  Paul J. M. Havinga,et al.  Activity Recognition Using Inertial Sensing for Healthcare, Wellbeing and Sports Applications: A Survey , 2010, ARCS Workshops.

[3]  Hwee Pink Tan,et al.  Deep Activity Recognition Models with Triaxial Accelerometers , 2015, AAAI Workshop: Artificial Intelligence Applied to Assistive Technologies and Smart Environments.

[4]  Michael Beigl,et al.  Activity recognition for creatures of habit , 2014, Pers. Ubiquitous Comput..

[5]  Yoshua Bengio,et al.  Convolutional networks for images, speech, and time series , 1998 .

[6]  Evan Welbourne,et al.  CrowdSignals: a call to crowdfund the community's largest mobile dataset , 2014, UbiComp Adjunct.

[7]  Honglak Lee,et al.  Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations , 2009, ICML '09.

[8]  Tara N. Sainath,et al.  Convolutional, Long Short-Term Memory, fully connected Deep Neural Networks , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[9]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Kwang-Ting Cheng,et al.  Using mobile GPU for general-purpose computing – a case study of face recognition on smartphones , 2011, Proceedings of 2011 International Symposium on VLSI Design, Automation and Test.

[11]  Fei-Fei Li,et al.  Visualizing and Understanding Recurrent Networks , 2015, ArXiv.

[12]  Matthias Budde,et al.  ActiServ: Activity Recognition Service for mobile phones , 2010, International Symposium on Wearable Computers (ISWC) 2010.

[13]  Diogo R. Ferreira,et al.  Preprocessing techniques for context recognition from accelerometer data , 2010, Personal and Ubiquitous Computing.

[14]  Sinziana Mazilu,et al.  GaitAssist: a daily-life support and training system for parkinson's disease patients with freezing of gait , 2014, CHI.

[15]  Didier Stricker,et al.  Introducing a New Benchmarked Dataset for Activity Monitoring , 2012, 2012 16th International Symposium on Wearable Computers.

[16]  Bernt Schiele,et al.  A tutorial on human activity recognition using body-worn inertial sensors , 2014, CSUR.

[17]  Patrick Olivier,et al.  The mobile fitness coach: Towards individualized skill assessment using personalized mobile devices , 2013, Pervasive Mob. Comput..

[18]  Gwenn Englebienne,et al.  In-Home Activity Recognition: Bayesian Inference for Hidden Markov Models , 2014, IEEE Pervasive Computing.

[19]  Daniel Roggen,et al.  Limited-Memory Warping LCSS for Real-Time Low-Power Pattern Recognition in Wireless Nodes , 2015, EWSN.

[20]  Geoffrey E. Hinton,et al.  Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[21]  Paul Lukowicz,et al.  Wearable Activity Tracking in Car Manufacturing , 2008, IEEE Pervasive Computing.

[22]  Harm de Vries,et al.  RMSProp and equilibrated adaptive learning rates for non-convex optimization. , 2015 .

[23]  Benjamin Schrauwen,et al.  Deep content-based music recommendation , 2013, NIPS.

[24]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[25]  Patrick Olivier,et al.  Feature Learning for Activity Recognition in Ubiquitous Computing , 2011, IJCAI.

[26]  Christian Szegedy,et al.  DeepPose: Human Pose Estimation via Deep Neural Networks , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[27]  Li Deng,et al.  Ensemble deep learning for speech recognition , 2014, INTERSPEECH.

[28]  Bo Yu,et al.  Convolutional Neural Networks for human activity recognition using mobile sensors , 2014, 6th International Conference on Mobile Computing, Applications and Services.

[29]  H. S. Wolff,et al.  iRun: Horizontal and Vertical Shape of a Region-Based Graph Compression , 2022, Sensors.

[30]  Héctor Pomares,et al.  mHealthDroid: A Novel Framework for Agile Development of Mobile Health Applications , 2014, IWAAL.

[31]  Tara N. Sainath,et al.  Deep Convolutional Neural Networks for Large-scale Speech Tasks , 2015, Neural Networks.

[32]  Matthew J. Hausknecht,et al.  Beyond short snippets: Deep networks for video classification , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Eduardo Sontag,et al.  Turing computability with neural nets , 1991 .

[34]  Sander Dieleman,et al.  Beyond Temporal Pooling: Recurrence and Temporal Convolutions for Gesture Recognition in Video , 2015, International Journal of Computer Vision.

[35]  Ricardo Chavarriaga,et al.  The Opportunity challenge: A benchmark database for on-body sensor-based activity recognition , 2013, Pattern Recognit. Lett..

[36]  Paul Lukowicz,et al.  Collecting complex activity datasets in highly rich networked sensor environments , 2010, 2010 Seventh International Conference on Networked Sensing Systems (INSS).

[37]  Kenneth Meijer,et al.  Activity identification using body-mounted sensors—a review of classification techniques , 2009, Physiological measurement.

[38]  Xiang Zhang,et al.  OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks , 2013, ICLR.

[39]  Colin Raffel,et al.  Lasagne: First release. , 2015 .

[40]  Shyamal Patel,et al.  A review of wearable sensors and systems with application in rehabilitation , 2012, Journal of NeuroEngineering and Rehabilitation.

[41]  K. Shadan,et al.  Available online: , 2012 .

[42]  JapkowiczNathalie,et al.  The class imbalance problem: A systematic study , 2002 .

[43]  Xiaoli Li,et al.  Deep Convolutional Neural Networks on Multichannel Time Series for Human Activity Recognition , 2015, IJCAI.

[44]  Honglak Lee,et al.  Unsupervised feature learning for audio classification using convolutional deep belief networks , 2009, NIPS.

[45]  Luca Benini,et al.  Activity Recognition from On-Body Sensors: Accuracy-Power Trade-Off by Dynamic Sensor Selection , 2008, EWSN.

[46]  Jürgen Schmidhuber,et al.  Learning Precise Timing with LSTM Recurrent Networks , 2003, J. Mach. Learn. Res..

[47]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[48]  Nathalie Japkowicz,et al.  The class imbalance problem: A systematic study , 2002, Intell. Data Anal..

[49]  Dimitri Palaz,et al.  Analysis of CNN-based speech recognition system using raw speech as input , 2015, INTERSPEECH.