Performance Analysis of Data Parallelism Technique in Machine Learning for Human Activity Recognition Using LSTM

Human activity recognition (HAR), driven by large deep learning models, has received a lot of attention in recent years due to its high applicability in diverse application domains, manipulate time-series data to speculate on activities. Meanwhile, the cloud term "as-a-service" has essentially revolutionized the information technology industry market over the last ten years. These two trends somehow are incorporating to inspire a new model for the assistive living application: HAR as a service in the cloud. However, with frequently updates deep learning frameworks in open source communities as well as various new hardware features release, which make a significant software management challenge for deep learning model developers. To address this problem, container techniques are widely employed to facilitate the deep learning software development cycle. In addition, models and the available datasets are being larger and more complicated, and so, an expanding amount of computing resources is desired so that these models are trained in a feasible amount of time. This requires an emerging distributed training approach, called data parallelism, to achieve low resource utilization and faster execution in training time. Therefore, in this paper, we apply the data parallelism to build an assistive living HAR application using LSTM model, deploying in containers within a Kubernetes cluster to enable the real-time recognition as well as prediction of changes in human activity patterns. We then systematically measure the influence of this technique on the performance of the HAR application. Firstly, we evaluate our system performance with regard to CPU and GPU when deployed in containers and host environment, then analyze the outcomes to verify the difference in terms of the model learning performance. Through the experiments, we figure out that data parallelism strategy is efficient for improving model learning performance. In addition, this technique helps to increase the scaling efficiency in our system.

[1]  Daniel Roggen,et al.  Deep Convolutional and LSTM Recurrent Neural Networks for Multimodal Wearable Activity Recognition , 2016, Sensors.

[2]  Davide Anguita,et al.  Transition-Aware Human Activity Recognition Using Smartphones , 2016, Neurocomputing.

[3]  Torsten Hoefler,et al.  Demystifying Parallel and Distributed Deep Learning , 2018, ACM Comput. Surv..

[4]  Tara N. Sainath,et al.  Convolutional, Long Short-Term Memory, fully connected Deep Neural Networks , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[5]  Shaohuai Shi,et al.  Performance Evaluation of Deep Learning Tools in Docker Containers , 2017, 2017 3rd International Conference on Big Data Computing and Communications (BIGCOM).

[6]  Yuqing Chen,et al.  A Deep Learning Approach to Human Activity Recognition Based on Single Accelerometer , 2015, 2015 IEEE International Conference on Systems, Man, and Cybernetics.

[7]  Jihun Hamm,et al.  A Large-Scale Study in Predictability of Daily Activities and Places , 2016, MobiCASE.

[8]  Nabil Alshurafa,et al.  SwallowNet: Recurrent neural network detects and characterizes eating patterns , 2017, 2017 IEEE International Conference on Pervasive Computing and Communications Workshops (PerCom Workshops).

[9]  Jascha Sohl-Dickstein,et al.  Measuring the Effects of Data Parallelism on Neural Network Training , 2018, J. Mach. Learn. Res..

[10]  Yueming Hu,et al.  Toward a Standard Interface for Cloud Providers: The Container as the Narrow Waist , 2016, IEEE Internet Computing.

[11]  Sung-Bae Cho,et al.  Human activity recognition with smartphone sensors using deep learning neural networks , 2016, Expert Syst. Appl..

[12]  Alexander Sergeev,et al.  Horovod: fast and easy distributed deep learning in TensorFlow , 2018, ArXiv.