Error-Aware Data Clustering for In-Network Data Reduction in Wireless Sensor Networks

A wireless sensor network (WSN) deploys hundreds or thousands of nodes that may introduce large-scale data over time. Dealing with such an amount of collected data is a real challenge for energy-constraint sensor nodes. Therefore, numerous research works have been carried out to design efficient data clustering techniques in WSNs to eliminate the amount of redundant data before transmitting them to the sink while preserving their fundamental properties. This paper develops a new error-aware data clustering (EDC) technique at the cluster-heads (CHs) for in-network data reduction. The proposed EDC consists of three adaptive modules that allow users to choose the module that suits their requirements and the quality of the data. The histogram-based data clustering (HDC) module groups temporal correlated data into clusters and eliminates correlated data from each cluster. Recursive outlier detection and smoothing (RODS) with HDC module provides error-aware data clustering, which detects random outliers using temporal correlation of data to maintain data reduction errors within a predefined threshold. Verification of RODS (V-RODS) with HDC module detects not only random outliers but also frequent outliers simultaneously based on both the temporal and spatial correlations of the data. The simulation results show that the proposed EDC is computationally cheap, able to reduce a significant amount of redundant data with minimum error, and provides efficient error-aware data clustering solutions for remote monitoring environmental applications.

[1]  Mohammad V. Malakooti,et al.  An Efficient Data Aggregation Method in Wireless Sensor Network based on the SVD , 2014 .

[2]  Shu-Ling Shieh,et al.  An Efficient Clustering Algorithm Based on Histogram Threshold , 2012, ACIIDS.

[3]  Huan Liu,et al.  Feature Selection for Clustering: A Review , 2018, Data Clustering: Algorithms and Applications.

[4]  Francisco de A. T. de Carvalho,et al.  Dynamic clustering of histogram data based on adaptive squared Wasserstein distances , 2011, Expert Syst. Appl..

[5]  Naixue Xiong,et al.  Data prediction, compression, and recovery in clustered wireless sensor networks for environmental monitoring applications , 2016, Inf. Sci..

[6]  Yun Liu,et al.  A Data-aggregation Scheme for WSN based on Optimal Weight Allocation , 2014, J. Networks.

[7]  Richard Lorion,et al.  Energy-efficient cluster-based protocol using an adaptive data aggregative window function (A-DAWF) for wireless sensor networks , 2016, 2016 IEEE 17th International Symposium on A World of Wireless, Mobile and Multimedia Networks (WoWMoM).

[8]  Robert V. Brill,et al.  Applied Statistics and Probability for Engineers , 2004, Technometrics.

[9]  Zhixiong Lu,et al.  A Novel Efficient Feature Dimensionality Reduction Method and Its Application in Engineering , 2018, Complex..

[10]  Melvin Alexander Applied Statistics and Probability for Engineers , 1995 .

[11]  Mohamed Abid,et al.  Outlier detection approaches for wireless sensor networks: A survey , 2017, Comput. Networks.

[12]  Richard Lorion,et al.  Energy-efficient data aggregation techniques for exploiting spatio-temporal correlations in wireless sensor networks , 2016, 2016 Wireless Telecommunications Symposium (WTS).

[13]  Hae-Sang Park,et al.  A simple and fast algorithm for K-medoids clustering , 2009, Expert Syst. Appl..

[14]  Daniel T. Larose,et al.  Discovering Knowledge in Data: An Introduction to Data Mining , 2005 .

[15]  R. H. Vishwanath,et al.  DRSP : Dimension Reduction For Similarity Matching And Pruning Of Time Series Data Streams , 2013, ArXiv.

[16]  Chien-Chung Shen,et al.  Sensor information networking architecture and applications , 2001, IEEE Wirel. Commun..

[17]  Dunja Mladenic,et al.  Feature Selection for Dimensionality Reduction , 2005, SLSFS.

[18]  Xianbin Wang,et al.  Recursive Principal Component Analysis-Based Data Outlier Detection and Sensor Data Aggregation in IoT Systems , 2017, IEEE Internet of Things Journal.

[19]  Dan Pescaru,et al.  Redundancy and its applications in wireless sensor networks: a survey , 2009 .

[20]  Lei Chen,et al.  In-network Outlier Cleaning for Data Collection in Sensor Networks , 2006, CleanDB.

[21]  Jean-Michel Dricot,et al.  Principal component aggregation for energy efficient information extraction in wireless sensor networks , 2008 .

[22]  Miad Faezipour,et al.  Features Dimensionality Reduction Approaches for Machine Learning Based Network Intrusion Detection , 2019, Electronics.

[23]  Hazem M. El-Bakry,et al.  Integrated data reduction model in wireless sensor networks , 2020, Applied Computing and Informatics.

[24]  David Laiymani,et al.  EK-means: A new clustering approach for datasets classification in sensor networks , 2019, Ad Hoc Networks.

[25]  AyadiAya,et al.  Outlier detection approaches for wireless sensor networks , 2017 .

[26]  Julio Cesar Stacchini de Souza,et al.  Data Compression in Smart Distribution Systems via Singular Value Decomposition , 2017, IEEE Transactions on Smart Grid.

[27]  Ali S. Hadi,et al.  Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[28]  Hamid R. Rabiee,et al.  Reducing the data transmission in Wireless Sensor Networks using the Principal Component Analysis , 2010, 2010 Sixth International Conference on Intelligent Sensors, Sensor Networks and Information Processing.

[29]  Charu C. Aggarwal,et al.  A framework for diagnosing changes in evolving data streams , 2003, SIGMOD '03.

[30]  Hwee Pink Tan,et al.  Rate-Distortion Balanced Data Compression for Wireless Sensor Networks , 2016, IEEE Sensors Journal.

[31]  Neil W. Bergmann,et al.  Time Series Analysis for Spatial Node Selection in Environment Monitoring Sensor Networks , 2017, Italian National Conference on Sensors.

[32]  Raphaël Couturier,et al.  Tree-Based Data Aggregation Approach in Periodic Sensor Networks Using Correlation Matrix and Polynomial Regression , 2016, 2016 IEEE Intl Conference on Computational Science and Engineering (CSE) and IEEE Intl Conference on Embedded and Ubiquitous Computing (EUC) and 15th Intl Symposium on Distributed Computing and Applications for Business Engineering (DCABES).

[33]  Fenxiong Chen,et al.  Algorithm of Data Compression Based on Multiple Principal Component Analysis over the WSN , 2010, 2010 6th International Conference on Wireless Communications Networking and Mobile Computing (WiCOM).

[34]  David Laiymani,et al.  A distributed real-time data prediction and adaptive sensing approach for wireless sensor networks , 2018, Pervasive Mob. Comput..

[35]  Ying Wang,et al.  Automatic ARIMA modeling-based data aggregation scheme in wireless sensor networks , 2013, EURASIP Journal on Wireless Communications and Networking.

[36]  Ke Shi,et al.  Mining Data Generated by Sensor Networks: A Survey , 2012 .

[37]  Xinwang Liu,et al.  K-Means Clustering With Incomplete Data , 2019, IEEE Access.

[38]  Zhi-huan Song,et al.  Distributed PCA Model for Plant-Wide Process Monitoring , 2013 .

[39]  Gerald Keller Statistics for Management and Economics: Abbreviated , 2003 .

[40]  Dejan Gjorgjevikj,et al.  Robust histogram-based feature engineering of time series data , 2015, 2015 Federated Conference on Computer Science and Information Systems (FedCSIS).

[41]  Sylvain Raybaud,et al.  Distributed Principal Component Analysis for Wireless Sensor Networks , 2008, Sensors.

[42]  Nazim Agoulmine,et al.  Improving Prediction Accuracy for WSN Data Reduction by Applying Multivariate Spatio-Temporal Correlation , 2011, Sensors.

[43]  Hassan Harb,et al.  An Enhanced K-Means and ANOVA-Based Clustering Approach for Similarity Aggregation in Underwater Wireless Sensor Networks , 2015, IEEE Sensors Journal.

[44]  Ke Shi,et al.  Data Mining Techniques for Wireless Sensor Networks: A Survey , 2013, Int. J. Distributed Sens. Networks.

[45]  Quanzhong Li,et al.  An Efficient Data Compression Model Based on Spatial Clustering and Principal Component Analysis in Wireless Sensor Networks , 2015, Sensors.

[46]  Azeddine Bilami,et al.  Big Data Challenges and Data Aggregation Strategies in Wireless Sensor Networks , 2018, IEEE Access.

[47]  David Laiymani,et al.  K-means based clustering approach for data aggregation in periodic sensor networks , 2014, 2014 IEEE 10th International Conference on Wireless and Mobile Computing, Networking and Communications (WiMob).

[48]  S. Diwakaran,et al.  A cluster prediction model-based data collection for energy efficient wireless sensor network , 2019, The Journal of Supercomputing.

[49]  Ruchuan Wang,et al.  Adaptive Data Acquisition with Energy Efficiency and Critical-Sensing Guarantee for Wireless Sensor Networks , 2019, Sensors.

[50]  Rahim Tafazolli,et al.  An adaptive method for data reduction in the Internet of Things , 2018, 2018 IEEE 4th World Forum on Internet of Things (WF-IoT).

[51]  Victoria J. Hodge,et al.  A Survey of Outlier Detection Methodologies , 2004, Artificial Intelligence Review.

[52]  Anazida Zainal,et al.  An adaptive and efficient dimension reduction model for multivariate wireless sensor networks applications , 2013, Appl. Soft Comput..

[53]  José López Vicario,et al.  Data Aggregation and Principal Component Analysis in WSNs , 2016, IEEE Transactions on Wireless Communications.

[54]  Neil W. Bergmann,et al.  Time Series Data Analysis of Wireless Sensor Network Measurements of Temperature , 2017, Sensors.

[55]  Efficiency of AR, MA and ARMA Models in Prediction of Raw and Filtered Center of Pressure Signals , 2019, XXVI Brazilian Congress on Biomedical Engineering.

[56]  Victor C. M. Leung,et al.  Balanced Iterative Reducing and Clustering Using Hierarchies with Principal Component Analysis (PBirch) for Intrusion Detection over Big Data in Mobile Cloud Environment , 2018, SpaCCS.

[57]  Azlan Awang,et al.  Data Clustering Technique for In-Network Data Reduction in Wireless Sensor Network , 2019, 2019 IEEE Student Conference on Research and Development (SCOReD).

[58]  Huan Liu,et al.  Advancing feature selection research , 2010 .

[59]  Parikshit N. Mahalle,et al.  Data Stream Clustering Techniques, Applications, and Models: Comparative Analysis and Discussion , 2018, Big Data Cogn. Comput..

[60]  Z. Li,et al.  Feature frequency extraction based on singular value decomposition and its application on rotor faults diagnosis , 2019, Journal of Vibration and Control.

[61]  Naixue Xiong,et al.  Similarity-aware data aggregation using fuzzy c-means approach for wireless sensor networks , 2019, EURASIP J. Wirel. Commun. Netw..

[62]  Anazida Zainal,et al.  Adaptive and online data anomaly detection for wireless sensor systems , 2014, Knowl. Based Syst..

[63]  Meng Wu,et al.  An Improved Distributed PCA-Based Outlier Detection in Wireless Sensor Network , 2018, ICCCS.