Robust and automatic data cleansing method for short-term load forecasting of distribution feeders

Distribution networks are undergoing fundamental changes at medium voltage level. To support growing planning and control decision-making, the need for large numbers of short-term load forecasts has emerged. Data-driven modelling of medium voltage feeders can be affected by (1) data quality issues, namely, large gross errors and missing observations (2) the presence of structural breaks in the data due to occasional network reconfiguration and load transfers. The present work investigates and reports on the effects of advanced data cleansing techniques on forecast accuracy. A hybrid framework to detect and remove outliers in large datasets is proposed; this automatic procedure combines the Tukey labelling rule and the binary segmentation algorithm to cleanse data more efficiently, it is fast and easy to implement. Various approaches for missing value imputation are investigated, including unconditional mean, Hot Deck via k-nearest neighbour and Kalman smoothing. A combination of the automatic detection/removal of outliers and the imputation methods mentioned above are implemented to cleanse time series of 342 medium-voltage feeders. A nested rolling-origin-validation technique is used to evaluate the feed-forward deep neural network models. The proposed data cleansing framework efficiently removes outliers from the data, and the accuracy of forecasts is improved. It is found that Hot Deck (k-NN) imputation performs best in balancing the bias-variance trade-off for short-term forecasting.

[1]  J. D. McDonald,et al.  A real-time implementation of short-term load forecasting for distribution power systems , 1994 .

[2]  A. Scott,et al.  A Cluster Analysis Method for Grouping Means in the Analysis of Variance , 1974 .

[3]  Felix F. Wu,et al.  Network Reconfiguration in Distribution Systems for Loss Reduction and Load Balancing , 1989, IEEE Power Engineering Review.

[4]  Qiang Yang,et al.  State-of-the-art techniques for modelling of uncertainties in active distribution network planning: A review , 2019, Applied Energy.

[5]  S. Sitharama Iyengar,et al.  Performance Evaluation of Imputation Methods for Incomplete Datasets , 2007, Int. J. Softw. Eng. Knowl. Eng..

[6]  Guillaume Foggia,et al.  Neural Network-Based Model Design for Short-Term Load Forecast in Distribution Systems , 2016, IEEE Transactions on Power Systems.

[7]  F. Hampel The Influence Curve and Its Role in Robust Estimation , 1974 .

[8]  Gavin C. Cawley,et al.  On Over-fitting in Model Selection and Subsequent Selection Bias in Performance Evaluation , 2010, J. Mach. Learn. Res..

[9]  Ton J. Cleophas,et al.  Missing-data Imputation , 2022 .

[10]  A. Harvey,et al.  Estimation Procedures for Structural Time Series Models , 1990 .

[11]  Haydar Demirhan,et al.  Missing value imputation for short to mid-term horizontal solar irradiance data , 2018, Applied Energy.

[12]  Yi-Ching Yao Estimating the number of change-points via Schwarz' criterion , 1988 .

[13]  Joseph Hilbe,et al.  Data Analysis Using Regression and Multilevel/Hierarchical Models , 2009 .

[14]  Yuan Zhang,et al.  Short-Term Residential Load Forecasting Based on LSTM Recurrent Neural Network , 2019, IEEE Transactions on Smart Grid.

[15]  Ran Li,et al.  Deep Learning for Household Load Forecasting—A Novel Pooling Deep RNN , 2018, IEEE Transactions on Smart Grid.

[16]  Ronen Feldman,et al.  The Data Mining and Knowledge Discovery Handbook , 2005 .

[17]  D.V. Nicolae,et al.  Reconfiguration and Load Balancing in the LV and MV Distribution Networks for Optimal Performance , 2007, IEEE Transactions on Power Delivery.

[18]  Nicolas Vayatis,et al.  Penalty learning for changepoint detection , 2017, 2017 25th European Signal Processing Conference (EUSIPCO).

[19]  Nicolas Vayatis,et al.  A review of change point detection methods , 2018, ArXiv.

[20]  Goran Strbac,et al.  Enhancing distribution network visibility using contingency analysis tools , 2017 .

[21]  Qiang Wang,et al.  Benchmarking State-of-the-Art Deep Learning Software Tools , 2016, 2016 7th International Conference on Cloud Computing and Big Data (CCBD).

[22]  R. Little Missing-Data Adjustments in Large Surveys , 1988 .

[23]  J. Tukey,et al.  Performance of Some Resistant Rules for Outlier Labeling , 1986 .

[24]  Erhard Rahm,et al.  Data Cleaning: Problems and Current Approaches , 2000, IEEE Data Eng. Bull..

[25]  J. T. Connor,et al.  A robust neural network filter for electricity demand prediction , 1996 .

[26]  Chang-Tien Lu,et al.  Outlier Detection , 2008, Encyclopedia of GIS.

[27]  Ronald J. Brachman,et al.  The Process of Knowledge Discovery in Databases , 1996, Advances in Knowledge Discovery and Data Mining.

[28]  Rob J Hyndman,et al.  Forecasting Time Series With Complex Seasonal Patterns Using Exponential Smoothing , 2011 .

[29]  Tusell Palmer,et al.  Multiple imputation of time series: an application to the construction of historical price indexes , 2005 .

[30]  Anany Levitin,et al.  The Notion of Data and Its Quality Dimensions , 1994, Inf. Process. Manag..

[31]  R. Tsay Outliers, Level Shifts, and Variance Changes in Time Series , 1988 .

[32]  Lorenzo Beretta,et al.  Nearest neighbor imputation algorithms: a critical evaluation , 2016, BMC Medical Informatics and Decision Making.

[33]  Leonard J. Tashman,et al.  Out-of-sample tests of forecasting accuracy: an analysis and review , 2000 .

[34]  Sylvain Arlot,et al.  A survey of cross-validation procedures for model selection , 2009, 0907.4728.

[35]  José Manuel Benítez,et al.  On the use of cross-validation for time series predictor evaluation , 2012, Inf. Sci..

[36]  F. Garnacho,et al.  Detection and Localization of Defects in Cable Sheath of Cross-Bonding Configuration by Sheath Currents , 2019, IEEE Transactions on Power Delivery.

[37]  Saifur Rahman,et al.  Day-ahead building-level load forecasts using deep learning vs. traditional time-series techniques , 2019, Applied Energy.

[38]  Vivek Srikumar,et al.  Predicting electricity consumption for commercial and residential buildings using deep recurrent neural networks , 2018 .

[39]  Richard J. Povinelli,et al.  Data Improving in Time Series Using ARX and ANN Models , 2017, IEEE Transactions on Power Systems.

[40]  Amir F. Atiya,et al.  A Bias and Variance Analysis for Multistep-Ahead Time Series Forecasting , 2016, IEEE Transactions on Neural Networks and Learning Systems.

[41]  F. Hampel A General Qualitative Definition of Robustness , 1971 .

[42]  Nirav Shah,et al.  Seasonal autoregressive modelling of water and fuel consumptions in buildings , 2007 .

[43]  Junfeng Yang,et al.  Improving the Accuracy of Bus Load Forecasting by a Two-Stage Bad Data Identification Method , 2014, IEEE Transactions on Power Systems.

[44]  Carlos E. Pedreira,et al.  Neural networks for short-term load forecasting: a review and evaluation , 2001 .