SMOTE and Gaussian Noise Based Sensor Data Augmentation

Deep learning achieves successful prediction results by training multilayer neural network based machine learning models on large amounts of data. One of the best ways to improve performance of artificial neural networks is to add more data to the training set. In the literature, some data augmentation techniques have been developed for this purpose and they are widely used in processing image data. For example, data can be enriched by replicating the images in the training set with views from different angles and sufficient data can be obtained to find a generalizable model. In this study, we focused on augmenting sensor data by applying image data augmentation methods. A data set including temperature, humidity, light, and air quality sensor data was augmented using two different data augmentation techniques (SMOTE Regression and Gaussian Noise), and their effect on the performance of an LSTM model which estimates missing or incorrect values was investigated. RMSE values obtained by using real and augmented data were compared to evaluate the impact of both techniques. The most successful test estimation model from the data set was the air quality model. In addition, it was concluded that SMOTE regression gave better results when the two techniques were compared.

[1]  Dana Kulic,et al.  Data augmentation of wearable sensor data for parkinson’s disease monitoring using convolutional neural networks , 2017, ICMI.

[2]  Hayit Greenspan,et al.  Synthetic data augmentation using GAN for improved liver lesion classification , 2018, 2018 IEEE 15th International Symposium on Biomedical Imaging (ISBI 2018).

[3]  James P. Hobert,et al.  The Data Augmentation Algorithm: Theory and Methodology , 2011 .

[4]  Ali Ouni,et al.  Optimal Deep Learning LSTM Model for Electric Load Forecasting using Feature Selection and Genetic Algorithm: Comparison with Machine Learning Approaches † , 2018, Energies.

[5]  Jürgen Schmidhuber,et al.  LSTM: A Search Space Odyssey , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[6]  Dong Seog Han,et al.  Feature Representation and Data Augmentation for Human Activity Classification Based on Wearable IMU Sensor Data Using a Deep LSTM Neural Network , 2018, Sensors.

[7]  Mark D. McDonnell,et al.  Understanding Data Augmentation for Classification: When to Warp? , 2016, 2016 International Conference on Digital Image Computing: Techniques and Applications (DICTA).

[8]  Luís Torgo,et al.  SMOTE for Regression , 2013, EPIA.

[9]  Christopher K. I. Williams,et al.  Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning) , 2005 .

[10]  Bin Zheng,et al.  Research Paper: Enhancing Text Categorization with Semantic-enriched Representation and Training Data Augmentation , 2006, J. Am. Medical Informatics Assoc..

[11]  Francesco Piazza,et al.  Preprocessing based solution for the vanishing gradient problem in recurrent neural networks , 2003, Proceedings of the 2003 International Symposium on Circuits and Systems, 2003. ISCAS '03..

[12]  Luis Perez,et al.  The Effectiveness of Data Augmentation in Image Classification using Deep Learning , 2017, ArXiv.

[13]  Yi Yang,et al.  Random Erasing Data Augmentation , 2017, AAAI.

[14]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[15]  Adamu I. Abubakar,et al.  Progress on Artificial Neural Networks for Big Data Analytics: A Survey , 2019, IEEE Access.