Neighborhood Features Help Detecting Electricity Theft in Big Data Sets

Electricity theft is a major problem around the world in both developed and developing countries and may range up to 40% of the total electricity distributed. More generally, electricity theft belongs to non-technical losses (NTL), which are losses that occur during the distribution of electricity in power grids. In this paper, we build features from the neighborhood of customers. We first split the area in which the customers are located into grids of different sizes. For each grid cell we then compute the proportion of inspected customers and the proportion of NTL found among the inspected customers. We then analyze the distributions of features generated and show why they are useful to predict NTL. In addition, we compute features from the consumption time series of customers. We also use master data features of customers, such as their customer class and voltage of their connection. We compute these features for a Big Data base of 31M meter readings, 700K customers and 400K inspection results. We then use these features to train four machine learning algorithms that are particularly suitable for Big Data sets because of their parallelizable structure: logistic regression, k-nearest neighbors, linear support vector machine and random forest. Using the neighborhood features instead of only analyzing the time series has resulted in appreciable results for Big Data sets for varying NTL proportions of 1%-90%. This work can therefore be deployed to a wide range of different regions around the world.

[1]  A. Chauhan,et al.  Non-Technical Losses in power system: A review , 2013, 2013 International Conference on Power, Energy and Control (ICPEC).

[2]  D. Cox The Regression Analysis of Binary Sequences , 1958 .

[3]  A. N. de Souza,et al.  Detection and Identification of Abnormalities in Customer Consumptions in Power Distribution Systems , 2011, IEEE Transactions on Power Delivery.

[4]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[5]  Dmitry Podkuiko,et al.  Energy Theft in the Advanced Metering Infrastructure , 2009, CRITIS.

[6]  N. Altman An Introduction to Kernel and Nearest-Neighbor Nonparametric Regression , 1992 .

[7]  D. S. Gastaldello,et al.  Identification and feature selection of non-technical losses for industrial consumers using the software WEKA , 2012, 2012 10th IEEE/IAS International Conference on Industry Applications.

[8]  Radu State,et al.  Large-scale detection of non-technical losses in imbalanced data sets , 2016, 2016 IEEE Power & Energy Society Innovative Smart Grid Technologies Conference (ISGT).

[9]  Radu State,et al.  The Challenge of Non-Technical Loss Detection using Artificial Intelligence: A Survey , 2016, Int. J. Comput. Intell. Syst..

[10]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[11]  L. T. DeCarlo On the meaning and use of kurtosis. , 1997 .

[12]  Wilbert B van den Hout,et al.  The area under an ROC curve with limited information. , 2003, Medical decision making : an international journal of the Society for Medical Decision Making.

[13]  Michele Banko,et al.  Scaling to Very Very Large Corpora for Natural Language Disambiguation , 2001, ACL.

[14]  Francis Eng Hock Tay,et al.  Support vector machine with adaptive parameters in financial time series forecasting , 2003, IEEE Trans. Neural Networks.

[15]  Alicia Fernández,et al.  Improving Electric Fraud Detection using Class Imbalance Strategies , 2012, ICPRAM.

[16]  Thomas J. Overbye,et al.  Literature review on the applications of data mining in power systems , 2014, 2014 Power and Energy Conference at Illinois (PECI).

[17]  David H. Wolpert,et al.  The Lack of A Priori Distinctions Between Learning Algorithms , 1996, Neural Computation.

[18]  Jun Luo,et al.  Energy-theft detection issues for advanced metering infrastructure in smart grid , 2014, Tsinghua Science and Technology.

[19]  Scott Shenker,et al.  Spark: Cluster Computing with Working Sets , 2010, HotCloud.

[20]  Thomas B. Smith,et al.  Electricity theft: a comparative analysis , 2004 .

[21]  Lingfeng Wang,et al.  High performance computing for detection of electricity theft , 2013 .

[22]  D. Cox The Regression Analysis of Binary Sequences , 2017 .

[23]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[24]  Mahbubur Rahman,et al.  Power sector reform in Bangladesh: Electricity distribution system , 2004 .

[25]  Vladimir Vapnik,et al.  An overview of statistical learning theory , 1999, IEEE Trans. Neural Networks.

[26]  Sieh Kiong Tiong,et al.  Nontechnical Loss Detection for Metered Customers in Power Utility Using Support Vector Machines , 2010, IEEE Transactions on Power Delivery.