A Correlation-Change Based Feature Selection Method for IoT Equipment Anomaly Detection

Selecting the right features for further data analysis is important in the process of equipment anomaly detection, especially when the origin data source involves high dimensional data with a low value density. However, existing researches failed to capture the fact that the sensor data are usually correlated (e.g., duplicated deployed sensors), and the correlations would be broken when anomalies occur with happen to the monitored equipment. In this paper, we propose to capture such sensor data correlation changes to improve the performance of IoT (Internet of Things) equipment anomaly detection. In our feature selection method, we first cluster correlated sensors together to recognize the duplicated deployed sensors according to sensor data correlations, and we monitor the data correlation changes in real time to select the sensors with correlation changes as the representative features for anomaly detection. To that end, (1) we conducted curve alignment for the sensor clustering; (2) we discuss the appropriate window size for data correlation calculation; (3) and adopted MCFS (Multi-Cluster Feature Selection) into our method to adapt to the online feature selection scenario. According to the experiment evaluation derived from real IoT equipment, we prove that our method manages to reduce the false negative of IoT equipment anomaly detection of 30% with almost the same level of false positive.

[1]  Tingting Zhou,et al.  A modified-distance-based minimum spanning tree method for analyzing hierarchical structure of power generation system , 2016, 2016 12th World Congress on Intelligent Control and Automation (WCICA).

[2]  Jinbo Bi,et al.  Dimensionality Reduction via Sparse Support Vector Machines , 2003, J. Mach. Learn. Res..

[3]  Rong Jin,et al.  Online Feature Selection and Its Applications , 2014, IEEE Transactions on Knowledge and Data Engineering.

[4]  Mohsen Guizani,et al.  A data-driven method for future Internet route decision modeling , 2019, Future Gener. Comput. Syst..

[5]  Lars Kotthoff,et al.  Auto-WEKA 2.0: Automatic model selection and hyperparameter optimization in WEKA , 2017, J. Mach. Learn. Res..

[6]  James Theiler,et al.  Online Feature Selection using Grafting , 2003, ICML.

[7]  Eamonn J. Keogh,et al.  Accelerating Dynamic Time Warping Clustering with a Novel Admissible Pruning Strategy , 2015, KDD.

[8]  Mohsen Guizani,et al.  Transactions papers a routing-driven Elliptic Curve Cryptography based key management scheme for Heterogeneous Sensor Networks , 2009, IEEE Transactions on Wireless Communications.

[9]  Hao Wang,et al.  Online Feature Selection with Streaming Features , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Ke Zhang,et al.  2016 Ieee International Conference on Big Data (big Data) Automated It System Failure Prediction: a Deep Learning Approach , 2022 .

[11]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[12]  Rajasekhar Mungara,et al.  A Routing-Driven Elliptic Curve Cryptography based Key Management Scheme for Heterogeneous Sensor Networks , 2014 .

[13]  Deng Cai,et al.  Unsupervised feature selection for multi-cluster data , 2010, KDD.

[14]  Xiaoxia Yin,et al.  A Real-Time Correlation of Host-Level Events in Cyber Range Service for Smart Campus , 2018, IEEE Access.

[15]  Xiaojiang Du,et al.  Security in wireless sensor networks , 2008, IEEE Wireless Communications.

[16]  Sarvesh Rawat,et al.  Multi-sensor data fusion by a hybrid methodology - A comparative study , 2016, Comput. Ind..

[17]  Xiaojiang Du,et al.  Achieving Efficient and Secure Data Acquisition for Cloud-Supported Internet of Things in Smart Grid , 2017, IEEE Internet of Things Journal.

[18]  Farzad Samie,et al.  IoT technologies for embedded computing: A survey , 2016, 2016 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS).

[19]  Xindong Wu,et al.  Towards Scalable and Accurate Online Feature Selection for Big Data , 2014, 2014 IEEE International Conference on Data Mining.

[20]  Huan Liu,et al.  Feature Selection for High-Dimensional Data: A Fast Correlation-Based Filter Solution , 2003, ICML.

[21]  Daoqiang Zhang,et al.  Iterative Laplacian Score for Feature Selection , 2012, CCPR.

[22]  Junghui Chen,et al.  Active learning dynamic soft sensor with forward-update scheme , 2017, 2017 6th International Symposium on Advanced Control of Industrial Processes (AdCONIP).

[23]  Huan Liu,et al.  Consistency-based search in feature selection , 2003, Artif. Intell..

[24]  Eamonn J. Keogh,et al.  An ultra-fast time series distance measure to allow data mining in more complex real-world deployments , 2020, Data Mining and Knowledge Discovery.

[25]  Lior Wolf,et al.  Feature selection for unsupervised and supervised inference: the emergence of sparsity in a weighted-based approach , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[26]  Kyong Joo Oh,et al.  Pattern Matching Trading System Based on the Dynamic Time Warping Algorithm , 2018, Sustainability.

[27]  Ying-Chih Liao,et al.  Integrated humidity and temperature sensing circuit fabricated by inkjet printing technology , 2016, 2016 11th International Microsystems, Packaging, Assembly and Circuits Technology Conference (IMPACT).

[28]  Jiang Gao Correlation Analysis in Curve Registration of Time Series , 2014 .

[29]  Xiaojiang Du,et al.  A survey of key management schemes in wireless sensor networks , 2007, Comput. Commun..

[30]  Rong Jin,et al.  Online feature selection for mining big data , 2012, BigMine '12.

[31]  Jing Zhou,et al.  Streamwise Feature Selection , 2006, J. Mach. Learn. Res..

[32]  Nenghai Yu,et al.  Large-Scale Online Feature Selection for Ultra-High Dimensional Sparse Data , 2014, ACM Trans. Knowl. Discov. Data.

[33]  Shadi Aljawarneh,et al.  Anomaly-based intrusion detection system through feature selection analysis and building hybrid efficient model , 2017, J. Comput. Sci..

[34]  Zhuang Wang,et al.  Log-based predictive maintenance , 2014, KDD.

[35]  Mohsen Guizani,et al.  An effective key management scheme for heterogeneous sensor networks , 2007, Ad Hoc Networks.

[36]  Christos Boutsidis,et al.  Online Principal Components Analysis , 2015, SODA.

[37]  Luciano Lavagno,et al.  Performance of Machine Learning Classifiers for Indoor Person Localization With Capacitive Sensors , 2017, IEEE Access.