Finding Value in Big Data - Statistical Analysis of Large Data Sets with Applications in Electric Power Systems

Aalto University, P.O. Box 11000, FI-00076 Aalto www.aalto.fi Author Matti Koivisto Name of the doctoral dissertation Finding Value in Big Data: Statistical Analysis of Large Data Sets with Applications in Electric Power Systems Publisher School of Electrical Engineering Unit Department of Electrical Engineering and Automation Series Aalto University publication series DOCTORAL DISSERTATIONS 224/2015 Field of research Power Systems and High Voltage Engineering Manuscript submitted 7 August 2015 Date of the defence 18 January 2016 Permission to publish granted (date) 9 November 2015 Language English Monograph Article dissertation (summary + original articles) Abstract A growing volume of data is becoming available in the field of electric power systems. The hourly automatic meter reading (AMR) electricity consumption data available from small customers, such as households and small businesses, is a significant new data source. For example, geographic data, wind speed data and phasor measurement unit data add to both the quantity and the significant variety in the available data. This thesis presents how these large data sets can be utilized in power system studies using statistical methodology. A visualization and clustering of a large AMR data set is presented, and consumption models are then estimated for the discovered clusters, i.e., consumer groups. Statistical modelling is applied to wind speed and wind generation data from multiple locations, with the emphasis on understanding the effect of the geographical distribution of wind power. In addition, combined statistical modelling of stochastic distributed generation (e.g., wind and solar power) and electricity consumption is presented, which allows the effects of stochastic generation to be analysed at the distribution system level. Interesting system operation conditions (e.g., power flows, consumption, wind generation) affecting the expected damping of the 0.35 Hz inter-area oscillation in the Nordic power system are identified, and their use in the short term prediction of damping is demonstrated using statistical methods. Several different geographically varying risk factors affecting the expected fault rates in power distribution systems are also identified, and the use of the estimated fault rates in automatic network planning is presented. It is argued that the statistical analysis of electricity consumption and generation can also be used in automatic network planning. Although the volume and variety of data are important in enabling data analyses, the value that can be extracted from the data using appropriate data analysis methods is fundamentally the most important aspect. In this thesis, multiple data visualization techniques are presented for finding patterns in the large data sets. The discovered patterns are then modelled using statistical data models. The need to model the probability distributions of the relevant random variables in detail is emphasized. This is especially important in wind power modelling, and was achieved using Monte Carlo simulation.A growing volume of data is becoming available in the field of electric power systems. The hourly automatic meter reading (AMR) electricity consumption data available from small customers, such as households and small businesses, is a significant new data source. For example, geographic data, wind speed data and phasor measurement unit data add to both the quantity and the significant variety in the available data. This thesis presents how these large data sets can be utilized in power system studies using statistical methodology. A visualization and clustering of a large AMR data set is presented, and consumption models are then estimated for the discovered clusters, i.e., consumer groups. Statistical modelling is applied to wind speed and wind generation data from multiple locations, with the emphasis on understanding the effect of the geographical distribution of wind power. In addition, combined statistical modelling of stochastic distributed generation (e.g., wind and solar power) and electricity consumption is presented, which allows the effects of stochastic generation to be analysed at the distribution system level. Interesting system operation conditions (e.g., power flows, consumption, wind generation) affecting the expected damping of the 0.35 Hz inter-area oscillation in the Nordic power system are identified, and their use in the short term prediction of damping is demonstrated using statistical methods. Several different geographically varying risk factors affecting the expected fault rates in power distribution systems are also identified, and the use of the estimated fault rates in automatic network planning is presented. It is argued that the statistical analysis of electricity consumption and generation can also be used in automatic network planning. Although the volume and variety of data are important in enabling data analyses, the value that can be extracted from the data using appropriate data analysis methods is fundamentally the most important aspect. In this thesis, multiple data visualization techniques are presented for finding patterns in the large data sets. The discovered patterns are then modelled using statistical data models. The need to model the probability distributions of the relevant random variables in detail is emphasized. This is especially important in wind power modelling, and was achieved using Monte Carlo simulation.

[1]  Jukka Turunen,et al.  Analysis of electromechanical modes using multichannel Yule-Walker estimation of a multivariate autoregressive model , 2013, IEEE PES ISGT Europe 2013.

[2]  P. McNabb,et al.  Oscillation source location using wavelet transforms and generalized linear models , 2010, IEEE PES T&D 2010.

[3]  K. Uhlen,et al.  Monitoring amplitude, frequency and damping of power system oscillations with PMU measurements , 2008, 2008 IEEE Power and Energy Society General Meeting - Conversion and Delivery of Electrical Energy in the 21st Century.

[4]  B. Klockl Multivariate Time Series Models Applied to the Assessment of Energy Storage in Power Systems , 2008, Proceedings of the 10th International Conference on Probablistic Methods Applied to Power Systems.

[5]  A. H. Murphy,et al.  Time Series Models to Simulate and Forecast Wind Speed and Wind Power , 1984 .

[6]  J. Tukey The Future of Data Analysis , 1962 .

[7]  N. B. Anuar,et al.  The rise of "big data" on cloud computing: Review and open research issues , 2015, Inf. Syst..

[8]  Janusz Bialek,et al.  IDENTIFYING SOURCES OF DAMPING ISSUES IN THE ICELANDIC POWER SYSTEM , 2008 .

[9]  S. S. Venkata,et al.  Predicting vegetation-related failure rates for overhead distribution feeders , 2002 .

[10]  Vladimir Terzija,et al.  Wake effect in wind farm performance: Steady-state and dynamic behavior , 2012 .

[11]  Ping Wang,et al.  Application of Pattern Recognition and Artificial Neural Network to Load Forecasting in Electric Power System , 2007, Third International Conference on Natural Computation (ICNC 2007).

[12]  Leo Breiman,et al.  Statistical Modeling: The Two Cultures (with comments and a rejoinder by the author) , 2001, Statistical Science.

[13]  Udaya Annakkage,et al.  Accurate prediction of damping in large interconnected power systems with the aid of regression analysis , 2009, 2009 IEEE Power & Energy Society General Meeting.

[14]  Kevin A. Clarke The Phantom Menace: Omitted Variable Bias in Econometric Research , 2005 .

[15]  M. Lehtonen,et al.  Fault rates of different types of medium voltage power lines in different environments , 2010, Proceedings of the 2010 Electric Power Quality and Supply Reliability Conference.

[16]  Matti Lehtonen,et al.  Distribution Network Topology Planning Using Life Cycle Cost driven Cost Surfaces in the Internodal Parameter Computation , 2013 .

[17]  C. S. Chen,et al.  Temperature Effect to Distribution System Load Profiles and Feeder Losses , 2001, IEEE Power Engineering Review.

[18]  M. Shahidehpour,et al.  A New Method for Spatial Power Network Planning in Complicated Environments , 2012, IEEE Transactions on Power Systems.

[19]  Alagan Anpalagan,et al.  Improved short-term load forecasting using bagged neural networks , 2015 .

[20]  Subhash Sharma Applied multivariate techniques , 1995 .

[21]  B. Dwolatzky,et al.  Terrain based routing of distribution cables , 1997 .

[22]  D. Lazer,et al.  The Parable of Google Flu: Traps in Big Data Analysis , 2014, Science.

[23]  D.J. Trudnowski,et al.  A Perspective on WAMS Analysis Tools for Tracking of Oscillatory Dynamics , 2007, 2007 IEEE Power Engineering Society General Meeting.

[24]  Matti Lehtonen,et al.  Optimizing the DR Control of Electric Storage Space Heating Using LP Approach , 2013 .

[25]  Eero Saarijärvi Geographical Perspectives on the Development of Power Distribution Systems in Sparsely Populated Areas , 2013 .

[26]  V. Miranda,et al.  GIS spatial analysis applied to electric line routing optimization , 2005, IEEE Transactions on Power Delivery.

[27]  Daniel J. Trudnowski,et al.  Initial results in electromechanical mode identification from ambient data , 1997 .

[28]  David J. Leinweber,et al.  Stupid Data Miner Tricks , 2007 .

[29]  C. D. Kemp,et al.  Density Estimation for Statistics and Data Analysis , 1987 .

[30]  A. Pahwa,et al.  Modeling Weather-Related Failures of Overhead Distribution Lines , 2007, 2007 IEEE Power Engineering Society General Meeting.

[31]  William H. Press,et al.  Numerical recipes: the art of scientific computing, 3rd Edition , 2007 .

[32]  M. Lehtonen,et al.  Switch and reserve connection placement in a distribution network planning algorithm , 2012, 2012 IEEE International Conference on Power System Technology (POWERCON).

[33]  Henry Louie,et al.  Evaluation of bivariate Archimedean and elliptical copulas to model wind power dependency structures , 2014 .

[34]  D. Rubinfeld,et al.  Econometric models and economic forecasts , 2002 .

[35]  Izham Zainal Abidin,et al.  Moving holidays' effects on the Malaysian peak daily load , 2010, 2010 IEEE International Conference on Power and Energy.

[36]  John W. Tukey,et al.  Exploratory data analysis , 1977, Addison-Wesley series in behavioral science : quantitative methods.

[37]  Barry L. Nelson,et al.  Modeling and Generating Multivariate Time Series with Arbitrary Marginals Using a Vector Autoregress , 2000 .

[38]  A. Feijoo,et al.  Simulation of Correlated Wind Speed Data for Economic Dispatch Evaluation , 2012, IEEE Transactions on Sustainable Energy.

[39]  Declan Butler,et al.  When Google got flu wrong , 2013, Nature.

[40]  J. F. Hauer,et al.  Initial results in Prony analysis of power system response signals , 1990 .

[41]  S. Muto,et al.  Regression based peak load forecasting using a transformation technique , 1994 .

[42]  Mladen Kezunovic,et al.  The role of big data in improving power system operation and protection , 2013, 2013 IREP Symposium Bulk Power System Dynamics and Control - IX Optimization, Security and Control of the Emerging Power Grid.

[43]  Gordon Reikard Predicting solar radiation at high resolutions: A comparison of time series forecasts , 2009 .

[44]  N. Mortensen,et al.  Production of the Finnish Wind Atlas , 2013 .

[45]  Matti Lehtonen,et al.  Impact of MV Connected Microgrids on MV Distribution Planning , 2012, IEEE Transactions on Smart Grid.

[46]  G. Papaefthymiou,et al.  Using Copulas for Modeling Stochastic Dependence in Power System Uncertainty Analysis , 2009, IEEE Transactions on Power Systems.

[47]  Kaj Nyström,et al.  Univariate Extreme Value Theory , GARCH and Measures of Risk , 2022 .

[48]  Xiaofu Xiong,et al.  A Statistic-Fuzzy Technique for Clustering Load Curves , 2007, IEEE Transactions on Power Systems.

[49]  Wenyuan Li,et al.  Generation System Reliability Evaluation Incorporating Correlations of Wind Speeds With Different Distributions , 2013, IEEE Transactions on Power Systems.

[50]  Mats Larsson,et al.  Monitoring of inter-area oscillations under ambient conditions using subspace identification , 2009, 2009 IEEE Power & Energy Society General Meeting.

[51]  Abhisek Ukil,et al.  Automated analysis of power systems disturbance records: Smart Grid big data perspective , 2014, 2014 IEEE Innovative Smart Grid Technologies - Asia (ISGT ASIA).

[52]  Wenyuan Li,et al.  Modelling wind speed dependence in system reliability assessment using copulas , 2012 .

[53]  P. Hilber,et al.  Vulnerability Analysis of Power Distribution Systems for Cost-Effective Resource Allocation , 2012, IEEE Transactions on Power Systems.

[54]  P. McCullagh,et al.  Generalized Linear Models , 1984 .

[55]  Andrew Harvey,et al.  The Econometric Analysis of Time Series - 2nd Edition , 1990 .

[56]  J. Torres,et al.  Forecast of hourly average wind speed with ARMA models in Navarre (Spain) , 2005 .

[57]  J. Rank Copulas: From theory to application in Finance , 2006 .

[58]  Xiaofu Xiong,et al.  Estimating wind speed probability distribution using kernel density method , 2011 .

[59]  D. Boyd,et al.  CRITICAL QUESTIONS FOR BIG DATA , 2012 .

[60]  B. De Moor,et al.  Short-term load forecasting, profile identification, and customer segmentation: a methodology based on periodic time series , 2005, IEEE Transactions on Power Systems.

[61]  Jukka Turunen,et al.  Modal analysis of power systems with eigendecomposition of multivariate autoregressive models , 2013, 2013 IEEE Grenoble Conference.

[62]  Zuwei Yu,et al.  Fractional weibull wind speed modeling for wind power production estimation , 2009, 2009 IEEE Power & Energy Society General Meeting.

[63]  A. McNeil,et al.  Estimation of tail-related risk measures for heteroscedastic financial time series: an extreme value approach , 2000 .

[64]  D. Infield,et al.  Application of Auto-Regressive Models to U.K. Wind Speed Data for Power System Impact Studies , 2012, IEEE Transactions on Sustainable Energy.

[65]  Matti Lehtonen,et al.  Load Flow Analysis Framework for Active Distribution Networks Based on Smart Meter Reading System , 2013 .

[66]  R. P. Saini,et al.  Statistical analysis of wind speed data using Weibull distribution parameters , 2014, 2014 1st International Conference on Non Conventional Energy (ICONCE 2014).

[67]  D.A. Bechrakis,et al.  Correlation of wind speed between neighboring measuring stations , 2004, IEEE Transactions on Energy Conversion.

[68]  Helmut Ltkepohl,et al.  New Introduction to Multiple Time Series Analysis , 2007 .

[69]  Jerome H. Friedman,et al.  DATA MINING AND STATISTICS: WHAT''S THE CONNECTION , 1997 .

[70]  Matti Lehtonen,et al.  Planning Large Distribution Networks in Real Environments , 2014 .

[71]  Jukka Turunen,et al.  A wavelet-based method for estimating damping in power systems , 2011 .

[72]  Z. Vale,et al.  An electric energy consumer characterization framework based on data mining techniques , 2005, IEEE Transactions on Power Systems.

[73]  Mo-Yuen Chow,et al.  A classification approach for power distribution systems fault cause identification , 2006, IEEE Transactions on Power Systems.

[74]  G. Papaefthymiou,et al.  Multivariate time series models for studies on stochastic generators in power systems , 2010 .