A Data Mining Method Based on the Variability of the Customer Consumption - A Special Application on Electric Utility Companies

This paper describes a method proposed in order to recover electrical energy (lost by abnormality or fraud) by means of a data mining analysis based in outliers detection. It provides a general methodology to obtain a list of abnormal users using only the general customer databases as input. The hole input information needed is taken exclusively from the general customers’ database. The data mining method has been successfully applied to detect abnormalities and fraudulencies in customer consumption. We provide a real study and we include a number of abnormal pattern examples. 1 THE NATURE OF ELECTRICAL UTILITY ANOMALIES Acording to electrical utilities, a non-technical loss is defined as any consumed energy or service which is not billed because of measuring equipment failure or ill-intentioned and fraudulent manipulation of said equipment. Therefore, detection of non-technical losses includes detection of fraudulent users. (Art ı́s et al., 1999) All our data are drawn from Endesa databases, with permission. Particularly, data in this paper is based on two representative customer sectors: private customers and lodging sector customers. We have selected two samples from two activity sectors with a historically high rate of non-technical losses, frauds and anomalies, and with very different consumption habits, in order to try to prove the mining method. 2 THE STATISTICAL APPROACH TO OUTLIERS DETECTION Very often, there exists data objects that do not comply with the general behavior of the data. Such data objects, which are grossly different from or inconsistent with the remaining data, are called outliers. Data mining is being applied to multiple fields and detection of non-technical looses is one field in which it has met with recent success (Kou et al., 2004) (Daskalaki et al., 2003) (Editorial, 2006). Considerable progress has been made in identifying fraud by mining methods (Kirkos et al., 2007) (Wheeler and Aitken, 2000). The method proposed in this paper is based in outliers’ detection and provides a general methodology to obtain a list of abnormal users using only the general customer databases as input. It has been successfully applied to detect inconsistencies and fraudulencies in customer energy consumption. Outliers can be caused by measurement error or by fraud in customer consumption. But, alternatively, outliers may be the result of inherent data variability. Thus, outliers detection and analysis is an interesting data mining task, referred to as outliers mining. The advantages of the proposed algorithm with respect to the existing technology is: • The elimination (or, at least, reduction) of the temporary component and the local geographical location component of the customer consumption. Outliers can be caused by measurement errors, not by the inherent data variability. • The study of the comparative consumption among clients of similar characteristics. This method is based on the observation that fraudsters seldom change their consumption habits (Art ı́s et al., 2000). They are closely linked to other fraudsters, but not to the rest of customers. • Classification methods are particularly useful when a database contains examples that can be used as the basic for future decision making (supervised methods). Thus, researchers have focused on different types of classification algorithms, including nearest neighbor (He et al., 1997), (He et al., 1999), decision tree induction, error back propagation (Brokett et al., 1998), (Brause et al., 1999), reinforcement learning and rule learning. The data mining based in outlier detection method presented is an unsupervised method. This doesn’t require one to be confident about the true classes of the original data used to built the models. It can be used to detect frauds or errors of a type which not have previously occurred. • The use of a simple tool, developed for mining very large data set. The statistical approach to outliers’ detection assumes a distribution or probability model for the given data set and then identifies outliers with respect to the model using a discordance test (Barao and Tawn, 1999), (Cabral et al., 2004). Application of the test requires knowledge of the data set parameters (such as the assumed data distribution), knowledge of the distribution parameters (such as the mean and variance) and, mainly, knowledge of the inherent data variability (Kantardzic, 1991).

[1]  J. Tawn,et al.  Extremal analysis of short series with outliers: sea‐levels and athletics records , 1999 .

[2]  Teresa F. Lunt,et al.  A survey of intrusion detection techniques , 1993, Comput. Secur..

[3]  S. Daskalaki,et al.  Data mining for decision support on customer insolvency in telecommunications business , 2003, Eur. J. Oper. Res..

[4]  Yannis Manolopoulos,et al.  Data Mining techniques for the detection of fraudulent financial statements , 2007, Expert Syst. Appl..

[5]  Tao Guo,et al.  Neural data mining for credit card fraud detection , 2008, 2008 International Conference on Machine Learning and Cybernetics.

[6]  John McCarthy,et al.  Phenomenal data mining: from data to phenomena , 2000, SKDD.

[7]  P. Brockett,et al.  Using Kohonen's Self-Organizing Feature Map to Uncover Automobile Bodily Injury Claims Fraud , 1998 .

[8]  Petra Perner,et al.  Recent advances in data mining , 2006, Engineering applications of artificial intelligence.

[9]  José Edison Cabral,et al.  Fraud detection in electrical energy consumers using rough sets , 2004, 2004 IEEE International Conference on Systems, Man and Cybernetics (IEEE Cat. No.04CH37583).

[10]  John Shawe-Taylor,et al.  Detecting Cellular Fraud Using Adaptive Prototypes. , 1997, AAAI 1997.

[11]  Chang-Tien Lu,et al.  Survey of fraud detection techniques , 2004, IEEE International Conference on Networking, Sensing and Control, 2004.

[12]  Dorothy E. Denning,et al.  An Intrusion-Detection Model , 1987, IEEE Transactions on Software Engineering.

[13]  J. Stuart Aitken,et al.  Multiple algorithms for fraud detection , 2000, Knowl. Based Syst..

[14]  Hongxing He,et al.  Application of neural networks to detection of medical fraud , 1997 .

[15]  Tom Fawcett,et al.  Adaptive Fraud Detection , 1997, Data Mining and Knowledge Discovery.

[16]  D. Edwards Data Mining: Concepts, Models, Methods, and Algorithms , 2003 .