Numerical sensitive data recognition based on hybrid gene expression programming for active distribution networks

Abstract Complex and flexible access mode, and frequent data interaction bring about large security risks to data transmission for active distribution networks. How to ensure data security is critical to the safe and stable operation of active distribution networks. Traditional methods, like access control, data encryption, and text filtering based on intelligent algorithms, are difficult to ensure the security of dynamically increased and high-dimensional numerical data transmission in active distribution networks. In this paper, we first propose a rough feature selection algorithm based on the average importance measurement (RFS-AIM) to simplify the complexity of data recognition. Then, we propose a sensitive data recognition function mining algorithm based on RFS-AIM and improved gene expression programming (SDR-IGEP) where population update operation is constructed by chromosome similarity based on the Jaccard coefficient. The operation avoids local convergence of the gene express programming by increasing individual diversity in the new population. Finally, we present a new incremental mining algorithm for a sensitive data recognition function based on global function fitting (ISDR-GFF) by using a grain granulation model for incremental datasets. The experimental results on IEEE benchmark datasets and real datasets show that the algorithms proposed in this paper outperform the state-of-the-art algorithms in terms of the average running time, precision, recall, F 1 index, accuracy, specificity and speedup on all experimental datasets.

[1]  Hong Chen,et al.  PARA: A positive-region based attribute reduction accelerator , 2019, Inf. Sci..

[2]  Vipin Kumar,et al.  Introduction to Data Mining , 2022, Data Mining and Machine Learning Applications.

[3]  Song Tan,et al.  Survey of Security Advances in Smart Grid: A Data Driven Approach , 2017, IEEE Communications Surveys & Tutorials.

[4]  Mounir Boukadoum,et al.  Improved global-best particle swarm optimization algorithm with mixed-attribute data classification capability , 2014, Appl. Soft Comput..

[5]  I. Pearson Smart grid cyber security for Europe , 2011 .

[6]  Maozhen Li,et al.  EGEP: An Event Tracker Enhanced Gene Expression Programming for Data Driven System Engineering Problems , 2019, IEEE Transactions on Emerging Topics in Computational Intelligence.

[7]  Sheng-Tun Li,et al.  A fuzzy conceptualization model for text mining with application in opinion polarity classification , 2013, Knowl. Based Syst..

[8]  Chao Yang,et al.  Distributed filtering under false data injection attacks , 2019, Autom..

[9]  Yuancheng Li,et al.  Lightweight Quantum Encryption for Secure Transmission of Power Data in Smart Grid , 2019, IEEE Access.

[10]  Liang Feng,et al.  Gene Expression Programming: A Survey [Review Article] , 2017, IEEE Computational Intelligence Magazine.

[11]  Shie-Jue Lee,et al.  A Similarity Measure for Text Classification and Clustering , 2014, IEEE Transactions on Knowledge and Data Engineering.

[12]  Dilan JAYAWEERA,et al.  Steady-state security in distribution networks with large wind farms , 2014, ENERGYO.

[13]  Xiaojiang Du,et al.  Achieving Efficient and Secure Data Acquisition for Cloud-Supported Internet of Things in Smart Grid , 2017, IEEE Internet of Things Journal.

[14]  Francisco Herrera,et al.  A Survey on the Application of Genetic Programming to Classification , 2010, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[15]  Xiaofeng Zhu,et al.  Efficient kNN Classification With Different Numbers of Nearest Neighbors , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[16]  Hossein Nezamabadi-pour,et al.  GSP: an automatic programming technique with gravitational search algorithm , 2018, Applied Intelligence.

[17]  Xiao Weidong,et al.  Classification in Networked Data Based on the Probability Generative Model , 2013 .

[18]  Mourad Debbabi,et al.  Communication security for smart grid distribution networks , 2013, IEEE Communications Magazine.

[19]  Charu C. Aggarwal,et al.  Feature Selection for Classification: A Review , 2014, Data Classification: Algorithms and Applications.

[20]  Xindong Wu,et al.  The Top Ten Algorithms in Data Mining , 2009 .

[21]  Yuefeng Li,et al.  Relevance Feature Discovery for Text Mining , 2014, IEEE Transactions on Knowledge and Data Engineering.

[22]  Henrik Sandberg,et al.  Security of smart distribution grids: Data integrity attacks on integrated volt/VAR control and countermeasures , 2014, 2014 American Control Conference.

[23]  Daniela Chrenko,et al.  An efficient Intrusion Detection System against cyber-physical attacks in the smart grid , 2018, Comput. Electr. Eng..

[24]  Sushmita Ruj,et al.  A Decentralized Security Framework for Data Aggregation and Access Control in Smart Grids , 2013, IEEE Transactions on Smart Grid.

[25]  Chien-Liang Liu,et al.  Semi-Supervised Text Classification With Universum Learning , 2016, IEEE Transactions on Cybernetics.

[26]  Mohamed Hamdi,et al.  At the cross roads of lattice-based and homomorphic encryption to secure data aggregation in smart grid , 2019, 2019 15th International Wireless Communications & Mobile Computing Conference (IWCMC).

[27]  Zahir Tari,et al.  Identification of vulnerable node clusters against false data injection attack in an AMI based Smart Grid , 2015, Inf. Syst..

[28]  Gu Xi A New Cross-multidomain Classification Algorithm and Its Fast Version for Large Datasets , 2014 .

[29]  Jean-Yves Le Boudec,et al.  Cyber-secure communication architecture for active power distribution networks , 2014, SAC.

[30]  Bingyang Li,et al.  Feature Reduction for Power System Transient Stability Assessment Based on Neighborhood Rough Set and Discernibility Matrix , 2018 .

[31]  Shie-Jue Lee,et al.  Multilabel Text Categorization Based on Fuzzy Relevance Clustering , 2014, IEEE Transactions on Fuzzy Systems.

[32]  Fuchun Sun,et al.  A Fast and Robust Sparse Approach for Hyperspectral Data Classification Using a Few Labeled Samples , 2012, IEEE Transactions on Geoscience and Remote Sensing.

[33]  Tingting Li,et al.  Fuzzy c-means clustering based on weights and gene expression programming , 2017, Pattern Recognit. Lett..

[34]  Mohsen Guizani,et al.  Toward Delay-Tolerant Flexible Data Access Control for Smart Grid With Renewable Energy Resources , 2017, IEEE Transactions on Industrial Informatics.

[35]  Song Tan,et al.  Online Data Integrity Attacks Against Real-Time Electrical Market in Smart Grid , 2018, IEEE Transactions on Smart Grid.

[36]  Yasser Abdel-Rady I. Mohamed,et al.  Optimum Microgrid Design for Enhancing Reliability and Supply-Security , 2013, IEEE Transactions on Smart Grid.

[37]  Pierluigi Siano,et al.  Big Data Issues in Smart Grids: A Survey , 2019, IEEE Systems Journal.

[38]  Chao Hu,et al.  A Classification Model of Power Equipment Defect Texts Based on Convolutional Neural Network , 2019, ICAIS.

[39]  Cândida Ferreira,et al.  Genetic Representation and Genetic neutrality in gene Expression Programming , 2002, Adv. Complex Syst..