CyL-GHI: Global Horizontal Irradiance Dataset Containing 18 Years of Refined Data at 30-Min Granularity from 37 Stations Located in Castile and León (Spain)

Accurate solar forecasting lately relies on advances in the field of artificial intelligence and on the availability of databases with large amounts of information on meteorological variables. In this paper, we present the methodology applied to introduce a large-scale, public, and solar irradiance dataset, CyL-GHI, containing refined data from 37 stations found within the Spanish region of Castile and León (Spanish: Castilla y León, or CyL). In addition to the data cleaning steps, the procedure also features steps that enable the addition of meteorological and geographical variables that complement the value of the initial data. The proposed dataset, resulting from applying the processing methodology, is delivered both in raw format and with the quality processing applied, and continuously covers 18 years (the period from 1 January 2002 to 31 December 2019), with a temporal resolution of 30 min. CyL-GHI can result in great importance in studies focused on the spatial-temporal characteristics of solar irradiance data, due to the geographical information considered that enables a regional analysis of the phenomena (the 37 stations cover a land area larger than 94,226 km2). Afterwards, three popular artificial intelligence algorithms were optimised and tested on CyL-GHI, their performance values being offered as baselines to compare other forecasting implementations. Furthermore, the ERA5 values corresponding to the studied area were analysed and compared with performance values delivered by the trained models. The inclusion of previous observations of neighbours as input to an optimised Random Forest model (applying a spatio-temporal approach) improved the predictive capability of the machine learning models by almost 3%.

[1]  P. Frossard,et al.  Interpretable temporal-spatial graph attention network for multi-site PV power forecasting , 2022, Applied Energy.

[2]  Yunjun Yu,et al.  Short-term solar irradiance prediction based on spatiotemporal graph convolutional recurrent neural network , 2022, Journal of Renewable and Sustainable Energy.

[3]  R. Amaro e Silva,et al.  Review on Spatio-Temporal Solar Forecasting Methods Driven by In Situ Measurements or Their Combination with Satellite and Numerical Weather Prediction (NWP) Estimates , 2022, Energies.

[4]  Rafael E. Carrillo,et al.  Spatio-Temporal Graph Neural Networks for Multi-Site PV Power Forecasting , 2021, IEEE Transactions on Sustainable Energy.

[5]  Mohammed Issam Kabbaj,et al.  Prediction of solar energy guided by pearson correlation using machine learning , 2021 .

[6]  Susan Leach-Murray The Linked Open Data Cloud , 2021 .

[7]  Robin Girard,et al.  Photovoltaic Power Forecasting: Assessment of the Impact of Multiple Sources of Spatio-Temporal Data on Forecast Accuracy , 2021, Energies.

[8]  Trilce Estrada,et al.  Girasol, a sky imaging and global solar irradiance dataset , 2021, Data in brief.

[9]  Okyay Kaynak,et al.  Spatiotemporal Behind-the-Meter Load and PV Power Forecasting via Deep Graph Dictionary Learning , 2020, IEEE Transactions on Neural Networks and Learning Systems.

[10]  F. Vignola,et al.  Structure of a comprehensive solar radiation dataset , 2020, Solar Energy.

[11]  A. Lin,et al.  Constructing a gridded direct normal irradiance dataset in China during 1981–2014 , 2020 .

[12]  Bixuan Gao,et al.  A Comparison of Hour-Ahead Solar Irradiance Forecasting Models Based on LSTM Network , 2020 .

[13]  Eenjun Hwang,et al.  Multistep-Ahead Solar Radiation Forecasting Scheme Based on the Light Gradient Boosting Machine: A Case Study of Jeju Island , 2020, Remote. Sens..

[14]  Jianhui Wang,et al.  Convolutional Graph Autoencoder: A Generative Deep Neural Network for Probabilistic Spatio-Temporal Solar Irradiance Forecasting , 2020, IEEE Transactions on Sustainable Energy.

[15]  Guillermo Yepes,et al.  Spatio-Temporal Resolution of Irradiance Samples in Machine Learning Approaches for Irradiance Forecasting , 2020, IEEE Access.

[16]  Stian Normann Anfinsen,et al.  Random forest regression for improved mapping of solar irradiance at high latitudes , 2020 .

[17]  Adel Mellit,et al.  Hourly global solar forecasting models based on a supervised machine learning algorithm and time series principle , 2020 .

[18]  Sugwon Hong,et al.  Deep Learning Models for Long-Term Solar Radiation Forecasting Considering Microgrid Installation: A Comparative Study , 2019 .

[19]  Chao Huang,et al.  Data-Driven Short-Term Solar Irradiance Forecasting Based on Information of Neighboring Sites , 2019, IEEE Transactions on Industrial Electronics.

[20]  R. Amaro e Silva,et al.  Spatio-temporal PV forecasting sensitivity to modules’ tilt and orientation , 2019 .

[21]  V. K. Giri,et al.  Solar radiation forecasting using MARS, CART, M5, and random forest model: A case study for India , 2019, Heliyon.

[22]  Dazhi Yang,et al.  OpenSolar: Promoting the openness and accessibility of diverse public solar datasets , 2019, Solar Energy.

[23]  David P. Larson,et al.  A comprehensive dataset for the accelerated development and benchmarking of solar forecasting methods , 2019, Journal of Renewable and Sustainable Energy.

[24]  Ravinesh C. Deo,et al.  Deep Learning Neural Networks Trained with MODIS Satellite-Derived Predictors for Long-Term Global Solar Radiation Prediction , 2019, Energies.

[25]  Rubén Urraca,et al.  Analysis of Spanish Radiometric Networks with the Novel Bias-Based Quality Control (BQC) Method , 2019, Sensors.

[26]  Nicholas A. Engerer,et al.  Data article: Distributed PV power data for three cities in Australia , 2019, Journal of Renewable and Sustainable Energy.

[27]  Dazhi Yang,et al.  A guideline to solar forecasting research practice: Reproducible, operational, probabilistic or physically-based, ensemble, and skill (ROPES) , 2019, Journal of Renewable and Sustainable Energy.

[28]  S. Businger,et al.  Development of a solar irradiance dataset for Oahu, Hawai'i , 2018, Renewable Energy.

[29]  Clifford W. Hansen,et al.  Pvlib Python: a Python Package for Modeling Solar Energy Systems , 2018, J. Open Source Softw..

[30]  Dazhi Yang,et al.  SolarData: An R package for easy access of publicly available solar datasets , 2018, Solar Energy.

[31]  G. Hodges,et al.  Baseline Surface Radiation Network (BSRN): structure and data description (1992–2017) , 2018, Earth System Science Data.

[32]  Ariana Moncada,et al.  Deep Learning to Forecast Solar Irradiance Using a Six-Month UTSA SkyImager Dataset , 2018, Energies.

[33]  Cristina Alonso-Tristán,et al.  Mathematical interpolation methods for spatial estimation of global horizontal irradiation in Castilla-León, Spain: A case study , 2017 .

[34]  Soteris A. Kalogirou,et al.  Machine learning methods for solar radiation forecasting: A review , 2017 .

[35]  Adel Gastli,et al.  Production of solar radiation bankable datasets from high-resolution solar irradiance derived with dynamical downscaling Numerical Weather prediction model , 2016 .

[36]  Miguel-Ángel Manso-Callejo,et al.  Forecasting short-term solar irradiance based on artificial neural networks and data from neighboring meteorological stations , 2016 .

[37]  Loredana Cristaldi,et al.  Models for solar radiation prediction based on different measurement sites , 2015 .

[38]  Lucien Wald,et al.  The SG2 algorithm for a fast and accurate computation of the position of the Sun for multi-decadal time period , 2012 .

[39]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[40]  L. Breiman Random Forests , 2001, Encyclopedia of Machine Learning and Data Mining.

[41]  G. Giebel,et al.  Smart4RES: Next generation solutions for renewable energy forecasting and applications with focus on distribution grids , 2021, CIRED 2021 - The 26th International Conference and Exhibition on Electricity Distribution.

[42]  S. Wilbert,et al.  Expert Quality Control of Solar Radiation Ground Data Sets , 2021, Proceedings of the ISES Solar World Congress 2021.

[43]  Bin Li,et al.  LSTM-Attention-Embedding Model-Based Day-Ahead Prediction of Photovoltaic Power Output Using Bayesian Optimization , 2019, IEEE Access.

[44]  Vladimir Vapnik,et al.  Support-vector networks , 2004, Machine Learning.