Density-Based Data Selection and Management for Edge Computing

Wide spread of IoT devices has made it possible to acquire enormous amounts of realtime sensor information. Due to the explosive increase in the sensing data volume, it becomes difficult to collect and process all the data in one central place. On one hand, storing and processing data on edge devices, so called edge computing, is becoming important. On the other hand, edge devices usually have only limited computing and memory resources, and hence it is not practical to process and save all the acquired data. There is a great demand of effectively selecting data to process on an edge device or to transfer it to a cloud server. In this paper, we propose an efficient density-based data selection and management method called O-D2M by which edge devices store the data representing inherent data distribution. We use a low cost graph algorithm to analyze input data trend and its density. We evaluate effectiveness of the proposed O-D2M comparing to other methods in terms of the accuracy of machine learning models trained by the selected data. Throughout the evaluation, we confirm that O-D2M obtains higher accuracy and lower computation cost while it can reduce the amount of data to be processed or transferred by up to 20 points.

[1]  Hiroshi Maruyama,et al.  Data Marketplace for Efficient Data Placement , 2013, 2013 IEEE 13th International Conference on Data Mining Workshops.

[2]  Ιωάννης Μανώλης,et al.  Οδηγός για το Raspberry Pi 3 Model B , 2017 .

[3]  Rex A. Dwyer Higher-dimensional voronoi diagrams in linear expected time , 1989, SCG '89.

[4]  Masatoshi Sekine,et al.  LACSLE: Lightweight and Adaptive Compressed Sensing Based on Deep Learning for Edge Devices , 2019, 2019 IEEE Global Communications Conference (GLOBECOM).

[5]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[6]  Tak-chung Fu,et al.  Flexible time series pattern matching based on perceptually important points , 2001 .

[7]  Jingyu Sun,et al.  Dissemination of edge-heavy data on heterogeneous MQTT brokers , 2017, 2017 IEEE 6th International Conference on Cloud Networking (CloudNet).

[8]  Weisong Shi,et al.  Edge Computing: Vision and Challenges , 2016, IEEE Internet of Things Journal.

[9]  Lei Zhang,et al.  On-line sensor calibration transfer among electronic nose instruments for monitoring volatile organic chemicals in indoor air quality , 2011 .

[10]  Xiaoyu Wang,et al.  Adaptive density estimation based on self-organizing incremental neural network using Gaussian process , 2017, 2017 International Joint Conference on Neural Networks (IJCNN).

[11]  Desire L. Massart,et al.  Representative subset selection , 2002 .

[12]  Xiang-Qun Xie,et al.  Data Mining a Small Molecule Drug Screening Representative Subset from NIH PubChem , 2008, J. Chem. Inf. Model..

[13]  Hong Linh Truong,et al.  MQTT-S — A publish/subscribe protocol for Wireless Sensor Networks , 2008, 2008 3rd International Conference on Communication Systems Software and Middleware and Workshops (COMSWARE '08).

[14]  Mianxiong Dong,et al.  Learning IoT in Edge: Deep Learning for the Internet of Things with Edge Computing , 2018, IEEE Network.

[15]  Xiong Xiao,et al.  A Load-Balancing Self-Organizing Incremental Neural Network , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[16]  Aram Kawewong,et al.  Fast online incremental transfer learning for unseen object classification using self-organizing incremental neural networks , 2011, The 2011 International Joint Conference on Neural Networks.

[17]  Geoffrey E. Hinton,et al.  Large scale distributed neural network training through online distillation , 2018, ICLR.

[18]  Bin Cheng,et al.  Real-time data reduction at the network edge of Internet-of-Things systems , 2015, 2015 11th International Conference on Network and Service Management (CNSM).

[19]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[20]  Peter Richtárik,et al.  Federated Learning: Strategies for Improving Communication Efficiency , 2016, ArXiv.

[21]  P. Jayasankar,et al.  Cloud-Assisted Data Fusion and Sensor Selection for Internet-of-Things , 2016 .

[22]  Roland Vollgraf,et al.  Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms , 2017, ArXiv.

[23]  Jeffrey Scott Vitter,et al.  Random sampling with a reservoir , 1985, TOMS.

[24]  S. Frick,et al.  Compressed Sensing , 2014, Computer Vision, A Reference Guide.

[25]  William J. Welch,et al.  Computer-aided design of experiments , 1981 .

[26]  Qinbao Song,et al.  A Fast Clustering-Based Feature Subset Selection Algorithm for High-Dimensional Data , 2013, IEEE Transactions on Knowledge and Data Engineering.

[27]  Shen Furao,et al.  An incremental network for on-line unsupervised classification and topology learning , 2006, Neural Networks.

[28]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[29]  Osamu Hasegawa,et al.  Nonparametric Density Estimation Based on Self-Organizing Incremental Neural Network for Large Noisy Data , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[30]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[31]  M. Carmen Garrido,et al.  Feature subset selection Filter-Wrapper based on low quality data , 2013, Expert Syst. Appl..

[32]  Hiroki Matsutani,et al.  Fast Semi-Supervised Anomaly Detection of Drivers’ Behavior using Online Sequential Extreme Learning Machine , 2020, 2020 IEEE 23rd International Conference on Intelligent Transportation Systems (ITSC).

[33]  Hiroki Matsutani,et al.  An Adaptive Abnormal Behavior Detection using Online Sequential Learning , 2019, 2019 IEEE International Conference on Computational Science and Engineering (CSE) and IEEE International Conference on Embedded and Ubiquitous Computing (EUC).