Impact of Feature Selection on Non-technical Loss Detection

Over the years, many countries have faced huge financial deficits due to Non-Technical Loss (NTL) in power sector. There are many ways of attempting to illegal use of electricity like by-passing and reversing meters. There have been many attempts to bring down NTL using manual and automated techniques. Manual NTL detection is not proving fruitful as it incurs heavy costs and has a low hit ratio. Due to the shortcoming of manual NTL detection, automated detection of NTL using machine learning classifiers is gaining attention in the research community. The datasets containing NTL belong to the class imbalance domain where regular consumers (negative class) out weight the representation of irregular consumers (positive class). To identify the right number of representative records, many techniques are proposed but selecting the right features in deciding NTL is equally an important task where not much has been contributed to the literature. In this paper, we propose the Incremental Feature Selection (IFS) algorithm which first uses feature importance to identify the most relevant features for NTL detection and then these features are used to test three classifiers namely CatBoost, Decision Tree (DT) Classifier and K-Nearest Neighbors (KNN) for NTL detection. This way, we have not only identified the most relevant features for NTL detection in a real dataset but also have brought down the overall computation time of the classifiers. Moreover, our proposed framework is tested on three performance evaluation metrics used in imbalance domain. The results show that using the most relevant features identified by the IFS algorithm, the three classifiers have the same or slightly better efficiency as compared to using all features.

[1]  Prem Prakash Jayaraman,et al.  The Role of Big Data Analytics in Industrial Internet of Things , 2019, Future Gener. Comput. Syst..

[2]  Anna Veronika Dorogush,et al.  CatBoost: unbiased boosting with categorical features , 2017, NeurIPS.

[3]  C. C. O. Ramos,et al.  New Insights on Nontechnical Losses Characterization Through Evolutionary-Based Feature Selection , 2012, IEEE Transactions on Power Delivery.

[4]  Ejaz Ahmed,et al.  Clustering‐based real‐time anomaly detection—A breakthrough in big data technologies , 2019, Trans. Emerg. Telecommun. Technol..

[5]  Ejaz Ahmed,et al.  Real-time big data processing for anomaly detection: A Survey , 2019, Int. J. Inf. Manag..

[6]  João Paulo Papa,et al.  A novel algorithm for feature selection using Harmony Search and its application for non-technical losses detection , 2011, Comput. Electr. Eng..

[7]  Geoffrey E. Hinton,et al.  Neighbourhood Components Analysis , 2004, NIPS.

[8]  Kelton A. P. Costa,et al.  Unsupervised non-technical losses identification through optimum-path forest , 2016 .

[9]  Guandong Xu,et al.  Big data analytics for preventive medicine , 2019, Neural Computing and Applications.

[10]  I. Monedero,et al.  Variability and Trend-Based Generalized Rule Induction Model to NTL Detection in Power Companies , 2011, IEEE Transactions on Power Systems.

[11]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[12]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[13]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[14]  Yi-Shin Chen,et al.  Improved practices in machine learning algorithms for NTL detection with imbalanced data , 2017, 2017 IEEE Power & Energy Society General Meeting.

[15]  Shahzad Memon,et al.  Methods and Techniques of Electricity Thieving in Pakistan , 2016 .

[16]  John Mingers,et al.  An Empirical Comparison of Pruning Methods for Decision Tree Induction , 1989, Machine Learning.

[17]  Zhi-Hua Zhou,et al.  Exploratory Under-Sampling for Class-Imbalance Learning , 2006, Sixth International Conference on Data Mining (ICDM'06).

[18]  Muhammad Imran,et al.  Performance analysis of machine learning classifiers for non-technical loss detection , 2020 .

[19]  Muhammad Imran,et al.  Performance Analysis of Different Types of Machine Learning Classifiers for Non-Technical Loss Detection , 2020, IEEE Access.

[20]  Nor Badrul Anuar,et al.  Blending Big Data Analytics: Review on Challenges and a Recent Study , 2020, IEEE Access.

[21]  Kilian Stoffel,et al.  Theoretical Comparison between the Gini Index and Information Gain Criteria , 2004, Annals of Mathematics and Artificial Intelligence.

[22]  Chia-Chi Chu,et al.  NTL Detection in Electric Distribution Systems Using the Maximal Overlap Discrete Wavelet-Packet Transform and Random Undersampling Boosting , 2018, IEEE Transactions on Power Systems.

[23]  Douglas Rodrigues,et al.  On the Study of Commercial Losses in Brazil: A Binary Black Hole Algorithm for Theft Characterization , 2018, IEEE Transactions on Smart Grid.

[24]  Guandong Xu,et al.  What’s Happening Around the World? A Survey and Framework on Event Detection Techniques on Twitter , 2019, Journal of Grid Computing.

[25]  Muhammad Awais,et al.  Physical activity classification using body-worn inertial sensors in a multi-sensor setup , 2016, 2016 IEEE 2nd International Forum on Research and Technologies for Society and Industry Leveraging a better tomorrow (RTSI).