Improving Electric Fraud Detection using Class Imbalance Strategies

Improving nontechnical loss detection is a huge challenge f or lectric companies. The great number of clients and the diversity of the different types of fraud makes this a very complex task. In this paper we present a fraud detection strategy based on class imbalance research . An automatic detection tool combining classification strategies is proposed. Individual classifiers such as One Class SVM, Cost Sensitive SVM (CS-SVM), Optimum Path Forest (OPF) and C4.5 Tree, and combination fun ctions are designed taken special care in the data’s class imbalance nature. Analysis over consumers his torical kWh load profile data from Uruguayan Electric Company (UTE) shows that using combination and balanci ng techniques improves automatic detection performance.

[1]  Gongping Yang,et al.  On the Class Imbalance Problem , 2008, 2008 Fourth International Conference on Natural Computation.

[2]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[3]  Ricardo Tanscheit,et al.  A Neuro-fuzzy System for Fraud Detection in Electricity Distribution , 2009, IFSA/EUSFLAT Conf..

[4]  Rong Jiang,et al.  Wavelet based feature extraction and multiple classifiers for electricity fraud detection , 2002, IEEE/PES Transmission and Distribution Conference and Exhibition.

[5]  Nitesh V. Chawla,et al.  Exploiting Diversity in Ensembles: Improving the Performance on Unbalanced Datasets , 2007, MCS.

[6]  Huan Liu,et al.  Feature Selection for Classification , 1997, Intell. Data Anal..

[7]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[8]  João Paulo Papa,et al.  Design of robust pattern classifiers based on optimum-path forests , 2007, ISMM.

[9]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[10]  Subhash C. Bagui,et al.  Combining Pattern Classifiers: Methods and Algorithms , 2005, Technometrics.

[11]  Robert P. W. Duin,et al.  PRTools - Version 3.0 - A Matlab Toolbox for Pattern Recognition , 2000 .

[12]  Nitesh V. Chawla,et al.  SMOTEBoost: Improving Prediction of the Minority Class in Boosting , 2003, PKDD.

[13]  R. A. Mollineda,et al.  The class imbalance problem in pattern classification and learning , 2009 .

[14]  Gavin Brown,et al.  "Good" and "Bad" Diversity in Majority Vote Ensembles , 2010, MCS.

[15]  R. Barandelaa,et al.  Strategies for learning in class imbalance problems , 2003, Pattern Recognit..

[16]  Joshua Alspector,et al.  Data duplication: an imbalance problem ? , 2003 .

[17]  Sieh Kiong Tiong,et al.  Nontechnical Loss Detection for Metered Customers in Power Utility Using Support Vector Machines , 2010, IEEE Transactions on Power Delivery.

[18]  C C O Ramos,et al.  A New Approach for Nontechnical Losses Detection Based on Optimum-Path Forest , 2011, IEEE Transactions on Power Systems.

[19]  Thomas G. Dietterich Multiple Classifier Systems , 2000, Lecture Notes in Computer Science.

[20]  João Paulo Papa,et al.  Optimum-Path Forest : A Novel and Powerful Framework for Supervised Graph-based Pattern Recognition Techniques , 2009 .

[21]  Xin Yao,et al.  Theoretical Study of the Relationship between Diversity and Single-Class Measures for Class Imbalance Learning , 2009, 2009 IEEE International Conference on Data Mining Workshops.

[22]  Gustavo E. A. P. A. Batista,et al.  A study of the behavior of several methods for balancing machine learning training data , 2004, SKDD.

[23]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .