Completion of wind turbine data sets for wind integration studies applying random forests and k-nearest neighbors

Abstract The importance of wind power as a renewable and cost-efficient power generation technology is growing globally. The impact of wind power on the existing power system, land use, and others over time has been widely studied. Such wind integration studies, especially when they are designed as retrospective bottom-up studies, rely on detailed wind turbine data, including the geographic locations, hub height, and dates of commission. Given the frequency of gaps present in these data sets, basic concepts have been developed to cope with missing data points. In this paper, multiple advanced algorithms were compared with respect to their ability to complete such data sets. One focus was on the selection of predictor variables to analyze the impact of different completion techniques depending on the specific gaps in the data set. A sample application using a German data set indicated that random forests are particularly well suited to the problem at hand.

[1]  Jon Olauson,et al.  Modelling the Swedish wind power production using MERRA reanalysis data , 2015 .

[2]  Klaus Schilling,et al.  QoS for industrial telemaintenance ⁎ ⁎This work is funded by the Bayerisches Staatsministerium für Wirtschaft und Medien, Energie und Technologie in its R&D program Bayern Digital. , 2018 .

[3]  Jack Chin Pang Cheng,et al.  Identifying the influential features on the regional energy use intensity of residential buildings based on Random Forests , 2016 .

[4]  K. Hornik,et al.  Unbiased Recursive Partitioning: A Conditional Inference Framework , 2006 .

[5]  Sebastian Rauner,et al.  The spatial dimension of the power system: Investigating hot spots of Smart Renewable Power Provision , 2016 .

[6]  H. Raum,et al.  Tierschutzleitlinie für die Milchkuhhaltung , 2007 .

[7]  Yan Su,et al.  Analysis of daily solar power prediction with data-driven approaches , 2014 .

[8]  Paulina Jaramillo,et al.  A review of large-scale wind integration studies , 2015 .

[9]  S. Pfenninger,et al.  Using bias-corrected reanalysis to simulate current and future wind power output , 2016 .

[10]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[11]  Seref Sagiroglu,et al.  A new approach to very short term wind speed prediction using k-nearest neighbor classification , 2013 .

[12]  A Keane,et al.  Capacity Value of Wind Power , 2011, IEEE Transactions on Power Systems.

[13]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[14]  Martin Greiner,et al.  Validation of Danish wind time series from a new global renewable energy atlas for energy system analysis , 2014, 1409.3353.

[15]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[16]  Stefan Feuerriegel,et al.  Putting Big Data analytics to work: Feature selection for forecasting electricity prices using the LASSO and random forests , 2014, J. Decis. Syst..

[17]  J. Ramos,et al.  Electricity Market Price Forecasting Based on Weighted Nearest Neighbors Techniques , 2007, IEEE Transactions on Power Systems.

[18]  Trevor Hastie,et al.  An Introduction to Statistical Learning , 2013, Springer Texts in Statistics.

[19]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[20]  Max Kuhn,et al.  Applied Predictive Modeling , 2013 .

[21]  Sahibsingh A. Dudani The Distance-Weighted k-Nearest-Neighbor Rule , 1976, IEEE Transactions on Systems, Man, and Cybernetics.

[22]  Iain Staffell,et al.  How does wind farm performance decline with age , 2014 .