Big data analytics for identifying electricity theft using machine learning approaches in microgrids for smart communities

Electricity theft (ET) causes major revenue loss in power utilities. It reduces the quality of supply, raises production cost, causes legal consumers to pay the higher cost, and impacts the economy as a whole. In this article, we use the State Grid Corporation of China (SGCC) dataset, which contains electricity consumption data of 1035 days for two classes: normal and fraudulent. In this work, ET detection model is proposed that consists of four steps: interpolation, data balancing, feature extraction, and classification. First, missing values of the dataset are recovered using the interpolation method. Second, resampling technique is implemented. ET consumers are 9% in the SGCC dataset that make the model inefficient to correctly classify both classes (normal and theft). A hybrid resampling technique is proposed, named synthetic minority oversampling technique with near miss. Third, residual network extracts the latent features from the SGCC dataset. Fourth, three tree based classifiers, such as decision tree (DT), random forest (RF), and adaptive boosting (AdaBoost) are applied to train the encoded feature vectors for classification. Besides, search for good hyperparameters is a challenging task, which is usually done manually and takes a considerable amount of time. To resolve this problem, Bayesian optimizer is used to simplify the tuning process of DT, RF, and AdaBoost. Finally, the results indicate that RF outperforms DT and AdaBoost.

[1]  Donald R. Jones,et al.  Efficient Global Optimization of Expensive Black-Box Functions , 1998, J. Glob. Optim..

[2]  Pedro Cruz-Romero,et al.  Hybrid Deep Neural Networks for Detection of Non-Technical Losses in Electricity Smart Meters , 2020, IEEE Transactions on Power Systems.

[3]  Jasper Snoek,et al.  Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.

[4]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Guiling Sun,et al.  The Sparsity Adaptive Reconstruction Algorithm Based on Simulated Annealing for Compressed Sensing , 2019, J. Electr. Comput. Eng..

[6]  Chadi Kari,et al.  A2Cloud‐RF: A random forest based statistical framework to guide resource selection for high‐performance scientific computing on the cloud , 2020, Concurr. Comput. Pract. Exp..

[7]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[8]  Rouzbeh Razavi,et al.  A practical feature-engineering framework for electricity theft detection in smart grids , 2019, Applied Energy.

[9]  Nadeem Javaid,et al.  Exploiting Nature-Inspired-Based Artificial Intelligence Techniques for Coordinated Day-Ahead Scheduling to Efficiently Manage Energy in Smart Grid , 2019, IEEE Access.

[10]  Sangho Choe,et al.  Energy Theft Detection Using Gradient Boosting Theft Detector With Feature Engineering-Based Preprocessing , 2019, IEEE Transactions on Smart Grid.

[11]  Kevin Leyton-Brown,et al.  Auto-WEKA: combined selection and hyperparameter optimization of classification algorithms , 2012, KDD.

[12]  Radu State,et al.  Large-scale detection of non-technical losses in imbalanced data sets , 2016, 2016 IEEE Power & Energy Society Innovative Smart Grid Technologies Conference (ISGT).

[13]  Yi-Shin Chen,et al.  Improved practices in machine learning algorithms for NTL detection with imbalanced data , 2017, 2017 IEEE Power & Energy Society General Meeting.

[14]  Bartosz Krawczyk,et al.  Learning from imbalanced data: open challenges and future directions , 2016, Progress in Artificial Intelligence.

[15]  Abbes Amira,et al.  Artificial Intelligence based Anomaly Detection of Energy Consumption in Buildings: A Review, Current Trends and New Perspectives. , 2020 .

[16]  Frank Hutter,et al.  Initializing Bayesian Hyperparameter Optimization via Meta-Learning , 2015, AAAI.

[17]  Katharina Eggensperger,et al.  Towards an Empirical Foundation for Assessing Bayesian Optimization of Hyperparameters , 2013 .

[18]  Chia-Chi Chu,et al.  NTL Detection in Electric Distribution Systems Using the Maximal Overlap Discrete Wavelet-Packet Transform and Random Undersampling Boosting , 2018, IEEE Transactions on Power Systems.

[19]  Nadeem Javaid,et al.  An Attention Guided Semi-Supervised Learning Mechanism to Detect Electricity Frauds in the Distribution Systems , 2020, IEEE Access.

[20]  Aurélien Géron,et al.  Hands-On Machine Learning with Scikit-Learn and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems , 2017 .

[21]  Victor C. M. Leung,et al.  Electricity Theft Detection in AMI Using Customers’ Consumption Patterns , 2016, IEEE Transactions on Smart Grid.

[22]  Andrew P. Bradley,et al.  The use of the area under the ROC curve in the evaluation of machine learning algorithms , 1997, Pattern Recognit..

[23]  Erchin Serpedin,et al.  Deep Learning Detection of Electricity Theft Cyber-Attacks in Renewable Distributed Generation , 2020, IEEE Transactions on Smart Grid.

[24]  Fernando Nogueira,et al.  Imbalanced-learn: A Python Toolbox to Tackle the Curse of Imbalanced Datasets in Machine Learning , 2016, J. Mach. Learn. Res..

[25]  Nadeem Javaid,et al.  Electricity Load and Price Forecasting Using Jaya-Long Short Term Memory (JLSTM) in Smart Grids , 2019, Entropy.

[26]  Shahzad Memon,et al.  Methods and Techniques of Electricity Thieving in Pakistan , 2016 .

[27]  Jinkuan Wang,et al.  Electricity Theft Detection in Power Grids with Deep Learning and Random Forests , 2019, J. Electr. Comput. Eng..

[28]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[29]  Chongqing Kang,et al.  A Novel Combined Data-Driven Approach for Electricity Theft Detection , 2019, IEEE Transactions on Industrial Informatics.

[30]  Aaron Klein,et al.  Efficient and Robust Automated Machine Learning , 2015, NIPS.

[31]  Abbes Amira,et al.  A Novel Approach for Detecting Anomalous Energy Consumption Based on Micro-Moments and Deep Neural Networks , 2020, Cognitive Computation.

[32]  Zibin Zheng,et al.  Wide and Deep Convolutional Neural Networks for Electricity-Theft Detection to Secure Smart Grids , 2018, IEEE Transactions on Industrial Informatics.

[33]  François Bachoc,et al.  Cross Validation and Maximum Likelihood estimations of hyper-parameters of Gaussian processes with model misspecification , 2013, Comput. Stat. Data Anal..

[34]  Andrew K. C. Wong,et al.  Classification of Imbalanced Data: a Review , 2009, Int. J. Pattern Recognit. Artif. Intell..

[35]  Pedro Cruz-Romero,et al.  Detection of Non-Technical Losses Using Smart Meter Data and Supervised Learning , 2019, IEEE Transactions on Smart Grid.

[36]  Hongseok Kim,et al.  Short-Term Load Forecasting based on ResNet and LSTM , 2018, 2018 IEEE International Conference on Communications, Control, and Computing Technologies for Smart Grids (SmartGridComm).

[37]  Chao Liu,et al.  Electricity theft detection in low-voltage stations based on similarity measure and DT-KSVM , 2021 .

[38]  Kevin Leyton-Brown,et al.  Sequential Model-Based Optimization for General Algorithm Configuration , 2011, LION.

[39]  Patrick Glauner,et al.  Distilling provider-independent data for general detection of non-technical losses , 2017, 2017 IEEE Power and Energy Conference at Illinois (PECI).

[40]  Péter Kacsuk,et al.  Big data and machine learning framework for clouds and its usage for text classification , 2020, Concurr. Comput. Pract. Exp..

[41]  Neeraj Kumar,et al.  Decision Tree and SVM-Based Data Analytics for Theft Detection in Smart Grid , 2016, IEEE Transactions on Industrial Informatics.

[42]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[43]  Nadeem Javaid,et al.  Hybrid meta-heuristic optimization based home energy management system in smart grid , 2019, Journal of Ambient Intelligence and Humanized Computing.

[44]  Dymitr Ruta,et al.  Gradient boosting decision trees for cyber security threats detection based on network events logs , 2019, 2019 IEEE International Conference on Big Data (Big Data).

[45]  Francisco Herrera,et al.  A Review on Ensembles for the Class Imbalance Problem: Bagging-, Boosting-, and Hybrid-Based Approaches , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[46]  Mohsen Asadi,et al.  Decision Tree based Electricity Theft Detection in Smart Grid , 2020, 2020 4th International Conference on Smart City, Internet of Things and Applications (SCIOT).

[47]  Jong-Myon Kim,et al.  Electricity Theft Detection in Smart Grid Systems: A CNN-LSTM Based Approach , 2019, Energies.

[48]  Xiaolin Li,et al.  Identifying Nontechnical Power Loss via Spatial and Temporal Deep Learning , 2016, 2016 15th IEEE International Conference on Machine Learning and Applications (ICMLA).

[49]  Michèle Sebag,et al.  Collaborative hyperparameter tuning , 2013, ICML.

[50]  Yang Xiao,et al.  Improving performance of transactional memory through machine learning , 2018, Concurr. Comput. Pract. Exp..

[51]  Tom Fawcett,et al.  ROC Graphs: Notes and Practical Considerations for Researchers , 2007 .

[52]  Nadeem Javaid,et al.  Detection of Non-Technical Losses Using SOSTLink and Bidirectional Gated Recurrent Unit to Secure Smart Meters , 2020, Applied Sciences.