Data quality of electricity consumption data in a smart grid environment

With the increasing penetration of traditional and emerging information technologies in the electric power industry, together with the rapid development of electricity market reform, the electric power industry has accumulated a large amount of data. Data quality issues have become increasingly prominent, which affect the accuracy and effectiveness of electricity data mining and energy big data analytics. It is also closely related to the safety and reliability of the power system operation and management based on data-driven decision support. In this paper, we study the data quality of electricity consumption data in a smart grid environment. First, we analyze the significance of data quality. Also, the definition and classification of data quality issues are explained. Then we analyze the data quality of electricity consumption data and introduce the characteristics of electricity consumption data in a smart grid environment. The data quality issues of electricity consumption data are divided into three types, namely noise data, incomplete data and outlier data. We make a detailed discussion on these three types of data quality issues. In view of that outlier data is one of the most prominent issues in electricity consumption data, so we mainly focus on the outlier detection of electricity consumption data. This paper introduces the causes of electricity consumption outlier data and illustrates the significance of the electricity consumption outlier data from the negative and positive aspects respectively. Finally, the focus of this paper is to provide a review on the detection methods of electricity consumption outlier data. The methods are mainly divided into two categories, namely the data mining-based and the state estimation-based methods.

[1]  Chengqi Zhang,et al.  POP algorithm: Kernel-based imputation to treat missing values in knowledge discovery from databases , 2009, Expert Syst. Appl..

[2]  Lilly Suriani Affendey,et al.  The impact of data quality dimensions on business process improvement , 2014, 2014 4th World Congress on Information and Communication Technologies (WICT 2014).

[3]  S.E. Collier,et al.  Real time distribution analysis for electric utilities , 2008, 2008 IEEE Rural Electric Power Conference.

[4]  Shanlin Yang,et al.  Energy conservation and emission reduction of China’s electric power industry , 2015 .

[5]  Symeon Papavassiliou,et al.  Adaptive and automated detection of service anomalies in transaction-oriented WANs: network analysis, algorithms, implementation, and deployment , 2000, IEEE Journal on Selected Areas in Communications.

[6]  Wang Heyong,et al.  The research of outlier data cleaning based on accelerating method , 2010, 2010 2nd IEEE International Conference on Information Management and Engineering.

[7]  H. Khincha,et al.  Robust Approach for Identification of Bad Data in State Estimation Using SLP Technique , 2007 .

[8]  Zhen Shao,et al.  Energy Internet: The business perspective , 2016 .

[9]  Dominik Fisch,et al.  SwiftRule: Mining Comprehensible Classification Rules for Time Series Analysis , 2011, IEEE Transactions on Knowledge and Data Engineering.

[10]  Shanlin Yang,et al.  Fuzziness parameter selection in fuzzy c-means: The perspective of cluster validation , 2014, Science China Information Sciences.

[11]  M.S. Shahriar,et al.  Quality Data for Data Mining and Data Mining for Quality Data: A Constraint Based Approach in XML , 2008, 2008 Second International Conference on Future Generation Communication and Networking Symposia.

[12]  Jing-Rong Li,et al.  RMINE: A Rough Set Based Data Mining Prototype for the Reasoning of Incomplete Data in Condition-based Fault Diagnosis , 2006, J. Intell. Manuf..

[13]  Shanlin Yang,et al.  Exploring the uniform effect of FCM clustering: A data distribution perspective , 2016, Knowl. Based Syst..

[14]  Kai Liu,et al.  Adaptive fuzzy clustering based anomaly data detection in energy system of steel industry , 2014, Inf. Sci..

[15]  Shanlin Yang,et al.  Understanding household energy consumption behavior: The contribution of energy big data analytics , 2016 .

[16]  Abdulelah Alwabel,et al.  Toward a framework for data quality in cloud-based health information system , 2013, International Conference on Information Society (i-Society 2013).

[17]  Robert Tibshirani,et al.  Estimating the number of clusters in a data set via the gap statistic , 2000 .

[18]  Paul Mangiameli,et al.  The Effects and Interactions of Data Quality and Problem Complexity on Classification , 2011, JDIQ.

[19]  VARUN CHANDOLA,et al.  Anomaly detection: A survey , 2009, CSUR.

[20]  Francesco Battaglia,et al.  Outliers Detection in Multivariate Time Series by Independent Component Analysis , 2007, Neural Computation.

[21]  Shao Yan-zhen Data Cleaning and its General System Framework , 2012 .

[22]  Emin Anarim,et al.  An intelligent intrusion detection system (IDS) for anomaly and misuse detection in computer networks , 2005, Expert Syst. Appl..

[23]  Zongxiang Lu,et al.  Application of change-point analysis to abnormal wind power data detection , 2014, 2014 IEEE PES General Meeting | Conference & Exposition.

[24]  Divesh Srivastava,et al.  Data quality: The other face of Big Data , 2014, 2014 IEEE 30th International Conference on Data Engineering.

[25]  Shan Yuan ADVANCED GENETIC ALGORITHM APPROACH TO UNIT COMMITMENT WITH SEARCHING OPTIMIZATION , 2001 .

[26]  Karl N. Levitt,et al.  Intrusion Detection Inter-component Adaptive Negotiation , 1999, Recent Advances in Intrusion Detection.

[27]  Rui Li,et al.  Data Mining with Independent Component Analysis , 2006, 2006 6th World Congress on Intelligent Control and Automation.

[28]  D. Rubin,et al.  Statistical Analysis with Missing Data. , 1989 .

[29]  David M. Rocke,et al.  Outlier detection in the multiple cluster setting using the minimum covariance determinant estimator , 2004, Comput. Stat. Data Anal..

[30]  Seth D. Guikema,et al.  Optimizing scheduling of post‐earthquake electric power restoration tasks , 2007 .

[32]  M. R. Bastos,et al.  Data integration: Quality aspects , 2010, 2010 IEEE/PES Transmission and Distribution Conference and Exposition: Latin America (T&D-LA).

[33]  Fan Yang,et al.  A power efficient 1.0625-3.125 Gb/s serial transceiver in 130 nm digital CMOS for multi-standard applications , 2013, Science China Information Sciences.

[34]  Chen-Chia Chuang,et al.  A soft computing technique for noise data with outliers , 2004, IEEE International Conference on Networking, Sensing and Control, 2004.

[35]  Khosrow Moslehi,et al.  A Reliability Perspective of the Smart Grid , 2010, IEEE Transactions on Smart Grid.

[36]  A. R. Messina,et al.  A structural time series approach to modeling dynamic trends in power system data , 2012, 2012 IEEE Power and Energy Society General Meeting.

[37]  Shanlin Yang,et al.  Demand side management in China: The context of China’s power industry reform , 2015 .

[38]  Shyh-Jier Huang,et al.  Enhancement of anomalous data mining in power system predicting-aided state estimation , 2004 .

[39]  Stuart E. Madnick,et al.  Improving data quality through effective use of data semantics , 2006, Data Knowl. Eng..

[40]  Jeff Heflin,et al.  Detecting Abnormal Semantic Web Data Using Semantic Dependency , 2011, 2011 IEEE Fifth International Conference on Semantic Computing.

[41]  Shi Dong-hui Outlier data mining application in power load forecasting , 2010 .

[42]  V. Miranda,et al.  Knowledge discovery in neural networks with application to transformer failure diagnosis , 2005, IEEE Transactions on Power Systems.

[43]  Yong Yu,et al.  A non-linear K-means algorithm and its application to unsupervised clustering , 2002, 6th International Conference on Signal Processing, 2002..

[44]  Yi-Ting Huang,et al.  Automatic Data Quality Evaluation for the AVM System , 2011, IEEE Transactions on Semiconductor Manufacturing.

[45]  T.Y. Lin,et al.  Anomaly detection , 1994, Proceedings New Security Paradigms Workshop.

[46]  Naomie Salim,et al.  Towards Data Quality into the Data Warehouse Development , 2011, 2011 IEEE Ninth International Conference on Dependable, Autonomic and Secure Computing.

[47]  Guoqiang Li,et al.  A New Method of Abnormal Data Detection on Traffic Flow of Extra Long Highway Tunnel , 2010, 2010 International Conference on Logistics Engineering and Intelligent Transportation Systems.

[48]  Hong Li,et al.  A new method of power system state estimation based on wide-area measurement system , 2009, 2009 4th IEEE Conference on Industrial Electronics and Applications.

[49]  Yuan Gao,et al.  Identification of the physical signatures of CDM induced latent defects into a DC-DC converter using low frequency noise measurements , 2007, Microelectron. Reliab..

[50]  Gang Huang,et al.  Research on metadata-driven data quality assessment architecture , 2013, 2013 IEEE Third International Conference on Information Science and Technology (ICIST).

[51]  Shanlin Yang,et al.  Big data driven smart energy management: From big data to big insights , 2016 .

[52]  Mir Mohsen Pedram,et al.  Data quality improvement using fuzzy association rules , 2010, 2010 International Conference on Electronics and Information Engineering.

[53]  Tang Tao,et al.  Bayesian Networks Parameter Learning Based on Noise Data Smoothing in Missing Information , 2012, 2012 Fifth International Symposium on Computational Intelligence and Design.

[54]  Wen Tan,et al.  Correlation Analysis of Operation Data and Its Application in Operation Optimization in Power Plant , 2008, 2008 Fifth International Conference on Fuzzy Systems and Knowledge Discovery.

[55]  Matthias Jarke,et al.  Systematic Development of Data Mining-Based Data Quality Tools , 2003, VLDB.

[56]  F. Boufares,et al.  Heterogeneous data-integration and data quality: Overview of conflicts , 2012, 2012 6th International Conference on Sciences of Electronics, Technologies of Information and Telecommunications (SETIT).

[57]  Han-Xiong Li,et al.  Multiple models fusion for pattern classification on noise data , 2012, 2012 International Conference on System Science and Engineering (ICSSE).

[58]  Tomasz Haupt,et al.  Distributed state estimation with PMU using grid computing , 2009, 2009 IEEE Power & Energy Society General Meeting.

[59]  Louis Perrochon,et al.  Towards Improving Data Quality , 1993, CISMOD.

[60]  Anazida Zainal,et al.  Adaptive and online data anomaly detection for wireless sensor systems , 2014, Knowl. Based Syst..

[61]  Raúl E. Sequeira,et al.  Blind intensity estimation from shot-noise data , 1997, IEEE Trans. Signal Process..

[62]  Shanlin Yang,et al.  Optimal load distribution model of microgrid in the smart grid environment , 2014 .

[63]  Peng Cheng,et al.  Novel method for the evaluation of data quality based on fuzzy control * * This project was supporte , 2008 .

[64]  Zhang Wang Research on Automatically Clustering Algorithm in Web Personalize Service , 2007 .

[65]  Ana Lucas,et al.  Corporate data quality management: From theory to practice , 2010, 5th Iberian Conference on Information Systems and Technologies.

[66]  Long Li,et al.  A cleaning method of noise data in RFID data streams , 2013, 2013 3rd International Conference on Consumer Electronics, Communications and Networks.

[67]  Svetha Venkatesh,et al.  Anomaly detection in large-scale data stream networks , 2012, Data Mining and Knowledge Discovery.

[68]  Mehmed Kantardzic,et al.  Data Mining: Concepts, Models, Methods, and Algorithms , 2002 .

[69]  Rabih A. Jabr,et al.  Power system state estimation using an iteratively reweighted least squares method for sequential L1-regression , 2006 .

[70]  Erhard Rahm,et al.  Data Cleaning: Problems and Current Approaches , 2000, IEEE Data Eng. Bull..

[71]  Wu Jun-ji The identification algorithm of bad data in power system based on GSA , 2005 .

[72]  D. N. Sidorov,et al.  Optimal Training of Artificial Neural Networks to Forecast Power System State Variables , 2014, Int. J. Energy Optim. Eng..

[73]  Chao Shen,et al.  A review of electric load classification in smart grid environment , 2013 .

[74]  Hong Wang,et al.  A new pretreatment approach of eliminating abnormal data in discrete time series , 2005, Proceedings. 2005 IEEE International Geoscience and Remote Sensing Symposium, 2005. IGARSS '05..

[75]  Yang Li,et al.  A lightweight web server anomaly detection method based on transductive scheme and genetic algorithms , 2008, Comput. Commun..

[76]  Roy Billinton,et al.  Maintenance Scheduling Optimization Using a Genetic Algorithm (GA) with a Probabilistic Fitness Function , 2004 .

[77]  Yu Liu,et al.  Case base maintenance based on outlier data mining , 2005, 2005 International Conference on Machine Learning and Cybernetics.

[78]  Shyh-Jier Huang,et al.  Enhancement of power system data debugging using GSA-based data-mining technique , 2002 .

[79]  Sen Bai,et al.  The Application and Research of Noise Data Acquisition with Wireless Network , 2009, 2009 International Conference on Environmental Science and Information Application Technology.

[80]  Wenyuan Li,et al.  Detecting X-Outliers in Load Curve Data in Power Systems , 2012, IEEE Transactions on Power Systems.

[81]  Hong Song,et al.  A new method for noise data detection based on DBSCAN and SVDD , 2015, 2015 IEEE International Conference on Cyber Technology in Automation, Control, and Intelligent Systems (CYBER).

[82]  Kumars Rouzbehi,et al.  Application of data mining on fault detection and prediction in Boiler of power plant using artificial neural network , 2009, 2009 International Conference on Power Engineering, Energy and Electrical Drives.

[83]  Ying Wah Teh,et al.  A Multi Density-Based Clustering Algorithm for Data Stream with Noise , 2013, 2013 IEEE 13th International Conference on Data Mining Workshops.

[84]  Li Lin-chuan A HYBRID APPROACH FOR DETECTION OF BAD DATA IN POWER SYSTEM STATE ESTIMATION , 2001 .

[85]  Amihai Motro,et al.  Utility-based resolution of data inconsistencies , 2004, IQIS '04.

[86]  Guor-Rurng Lii,et al.  Reliability Planning Employing Genetic Algorithms for an Electric Power System , 1999, Appl. Artif. Intell..