Automatic hourly solar forecasting using machine learning models

Abstract Owing to its recent advance, machine learning has spawned a large collection of solar forecasting works. In particular, machine learning is currently one of the most popular approaches for hourly solar forecasting. Nevertheless, there is evidently a myth on forecast accuracy—virtually all research papers claim superiority over others. Apparently, the “best” model can only be selected with hindsight, i.e., after empirical evaluation. For any new forecasting project, it is irrational for solar forecasters to bet on a single model from the start. In this article, the hourly forecasting performance of 68 machine learning algorithms is evaluated for 3 sky conditions, 7 locations, and 5 climate zones in the continental United States. To ensure a fair comparison, no hybrid model is considered, and only off-the-shelf implementations of these algorithms are used. Moreover, all models are trained using the automatic tuning algorithm available in the R caret package. It is found that tree-based methods consistently perform well in terms of two-year overall results, however, they rarely stand out during daily evaluation. Although no universal model can be found, some preferred ones for each sky and climate condition are advised.

[1]  Teuvo Kohonen,et al.  The self-organizing map , 1990 .

[2]  Runze Li,et al.  Quantile Regression for Analyzing Heterogeneity in Ultra-High Dimension , 2012, Journal of the American Statistical Association.

[3]  Carlos F.M. Coimbra,et al.  Assessment of machine learning techniques for deterministic and probabilistic intra-hour solar forecasts , 2018, Renewable Energy.

[4]  Soteris A. Kalogirou,et al.  Machine learning methods for solar radiation forecasting: A review , 2017 .

[5]  I. Jolliffe Principal Component Analysis , 2005 .

[6]  Matteo De Felice,et al.  Deterministic and Stochastic Approaches for Day-Ahead Solar Power Forecasting , 2017 .

[7]  J. Fox,et al.  Applied Regression Analysis and Generalized Linear Models , 2008 .

[8]  Yoshua Bengio,et al.  Random Search for Hyper-Parameter Optimization , 2012, J. Mach. Learn. Res..

[9]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[10]  David A. Freedman,et al.  Statistical Models: Theory and Practice: References , 2005 .

[11]  José R. Dorronsoro,et al.  Hybrid machine learning forecasting of solar radiation values , 2016, Neurocomputing.

[12]  S. T. Buckland,et al.  An Introduction to the Bootstrap. , 1994 .

[13]  Pierre Geurts,et al.  Extremely randomized trees , 2006, Machine Learning.

[14]  Achim Zeileis,et al.  evtree: Evolutionary Learning of Globally Optimal Classification and Regression Trees in R , 2014 .

[15]  Dazhi Yang,et al.  Solar radiation on inclined surfaces: Corrections and benchmarks , 2016 .

[16]  J. Dudhia,et al.  A Fast All-sky Radiation Model for Solar applications (FARMS): Algorithm and performance evaluation , 2016 .

[17]  A. E. Hoerl,et al.  Ridge regression: biased estimation for nonorthogonal problems , 2000 .

[18]  Richard Perez,et al.  The Cost of Mitigating Short-term PV Output Variability☆ , 2014 .

[19]  Brian D. Ripley,et al.  Pattern Recognition and Neural Networks , 1996 .

[20]  Henrik Madsen,et al.  Multi-site solar power forecasting using gradient boosted regression trees , 2017 .

[21]  Reinaldo Tonkoski,et al.  Solar Irradiance Forecasting in Remote Microgrids Using Markov Switching Model , 2017, IEEE Transactions on Sustainable Energy.

[22]  David J. Hand,et al.  Classifier Technology and the Illusion of Progress , 2006, math/0606441.

[23]  Dazhi Yang,et al.  A correct validation of the National Solar Radiation Data Base (NSRDB) , 2018, Renewable and Sustainable Energy Reviews.

[24]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[25]  Christian A. Gueymard,et al.  Minimum redundancy – Maximum relevance with extreme learning machines for global solar radiation forecasting: Toward an optimized dimensionality reduction for solar time series , 2017 .

[26]  Bernhard Lang,et al.  Monotonic Multi-layer Perceptron Networks as Universal Approximators , 2005, ICANN.

[27]  Francesco Grimaccia,et al.  Analysis and validation of 24 hours ahead neural network forecasting of photovoltaic output power , 2017, Math. Comput. Simul..

[28]  R. Deo,et al.  Forecasting long-term global solar radiation with an ANN algorithm coupled with satellite-derived (MODIS) land surface temperature (LST) for regional locations in Queensland , 2017 .

[29]  N. Meinshausen Node harvest: simple and interpretable regression and classication , 2009, 0910.2145.

[30]  Saifur Rahman,et al.  Solar irradiance forecast using aerosols measurements: A data driven approach , 2018, Solar Energy.

[31]  N. Altman An Introduction to Kernel and Nearest-Neighbor Nonparametric Regression , 1992 .

[32]  B. Rudolf,et al.  World Map of the Köppen-Geiger climate classification updated , 2006 .

[33]  Zhi-Hua Zhou,et al.  Ensemble Methods: Foundations and Algorithms , 2012 .

[34]  G. Casella,et al.  The Bayesian Lasso , 2008 .

[35]  Bri-Mathias Hodge,et al.  A suite of metrics for assessing the performance of solar power forecasting , 2015 .

[36]  Emanuele Crisostomi,et al.  Day-Ahead Hourly Forecasting of Power Generation From Photovoltaic Plants , 2018, IEEE Transactions on Sustainable Energy.

[37]  A. Marzo,et al.  Daily global solar radiation estimation in desert areas using daily extreme temperatures and extraterrestrial radiation , 2017 .

[38]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[39]  Nicolai Meinshausen,et al.  Quantile Regression Forests , 2006, J. Mach. Learn. Res..

[40]  J. Friedman,et al.  Projection Pursuit Regression , 1981 .

[41]  José A. Ruiz-Arias,et al.  Worldwide inter-comparison of clear-sky solar radiation models: Consensus-based review of direct and global irradiance components simulated at the earth surface , 2018, Solar Energy.

[42]  Bart De Schutter,et al.  Short-term forecasting of solar irradiance without local telemetry: a generalized model using satellite data , 2018, Solar Energy.

[43]  J. Friedman Stochastic gradient boosting , 2002 .

[44]  C. Gueymard REST2: High-performance solar radiation model for cloudless-sky irradiance, illuminance, and photosynthetically active radiation – Validation with a benchmark dataset , 2008 .

[45]  Carlos F.M. Coimbra,et al.  History and trends in solar irradiance and PV power forecasting: A preliminary assessment and review using text mining , 2018, Solar Energy.

[46]  C. Long,et al.  SURFRAD—A National Surface Radiation Budget Network for Atmospheric Research , 2000 .

[47]  Gene H. Golub,et al.  Generalized cross-validation as a method for choosing a good ridge parameter , 1979, Milestones in Matrix Computation.

[48]  Alexander J. Smola,et al.  Support Vector Regression Machines , 1996, NIPS.

[49]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[50]  W. M. Bolstad Introduction to Bayesian Statistics , 2004 .

[51]  P. McCullagh,et al.  Generalized Linear Models , 1984 .

[52]  Tamer Khatib,et al.  A novel hybrid model for hourly global solar radiation prediction using random forests technique and firefly algorithm , 2017 .

[53]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[54]  S. Keleş,et al.  Sparse partial least squares regression for simultaneous dimension reduction and variable selection , 2010, Journal of the Royal Statistical Society. Series B, Statistical methodology.

[55]  Annette M. Molinaro,et al.  partDSA: deletion/substitution/addition algorithm for partitioning the covariate space in prediction , 2010, Bioinform..

[56]  Galen Maclaurin,et al.  The National Solar Radiation Data Base (NSRDB) , 2017, Renewable and Sustainable Energy Reviews.

[57]  David J. C. MacKay,et al.  Bayesian Interpolation , 1992, Neural Computation.

[58]  J. A. Ruiz-Arias,et al.  Extensive worldwide validation and climate sensitivity analysis of direct irradiance predictions from 1-min global irradiance , 2016 .

[59]  Alex J. Cannon Quantile regression neural networks: Implementation in R and application to precipitation downscaling , 2011, Comput. Geosci..

[60]  J. Friedman Multivariate adaptive regression splines , 1990 .

[61]  A. Selvakumar,et al.  Assessment of SVM, empirical and ANN based solar radiation prediction models with most influencing input parameters , 2017, Renewable Energy.

[62]  R. Tibshirani,et al.  Least angle regression , 2004, math/0406456.

[63]  Wenjiang J. Fu Penalized Regressions: The Bridge versus the Lasso , 1998 .

[64]  Stefan Lessmann,et al.  A comparative study of LSTM neural networks in forecasting day-ahead global horizontal irradiance with satellite data , 2018 .

[65]  Luca Massidda,et al.  Use of Multilinear Adaptive Regression Splines and numerical weather prediction to forecast the power output of a PV plant in Borkum, Germany , 2017 .

[66]  Max Kuhn,et al.  Building Predictive Models in R Using the caret Package , 2008 .