The performance comparison of Multiple Linear Regression, Random Forest and Artificial Neural Network by using photovoltaic and atmospheric data

In this study, the estimation performances of Multiple Linear Regression, Random Forest, and Artificial Neural Network are examined comparatively. For comparison of these data mining techniques, the power production data from a Photovoltaic Module was used in the research. In this study, the model was constituted from seven variables. One of the variables is dependent (power) and the others are independent variables (global radiation, temperature, wind speed, wind direction, relative humidity, solar elevation angle). In this paper, the Mean Absolute Error and the correlation coefficient were used in order to compare the estimation performance of the mentioned data mining techniques. While the correlation coefficient is 0.963 in Multiple Linear Regression model, the correlation coefficient is 0.986 in Random Forest decision tree method. The highest correlation coefficient was obtained in Artificial Neural Network architecture (R = 0.997). According to the three data mining methods, the global radiation was found as the most important predictor. While the least important predictor is the wind direction in both the Artificial Neural Network and the Random Forest models, the solar elevation angle is the least important predictor in the Multiple Linear Regression model.

[1]  Giacomo Capizzi,et al.  A radial basis function neural network based approach for the electrical characteristics estimation of a photovoltaic module , 2012, ArXiv.

[2]  Kulthida Tuamsuk,et al.  Data Mining and Its Applications for Knowledge Management: A Literature Review from 2007 to 2012 , 2012, ArXiv.

[3]  Bharati M. Ramageri DATA MINING TECHNIQUES AND APPLICATIONS , 2011 .

[4]  Alan Christoffels,et al.  Comparative genomics in cyprinids: common carp ESTs help the annotation of the zebrafish genome , 2006, BMC Bioinformatics.

[5]  H Du,et al.  Data Mining Techniques and Applications , 2010 .

[6]  Tuğrul Özel,et al.  Predictive modeling of surface roughness and tool wear in hard turning using regression and neural networks , 2005 .

[7]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[8]  M. Kayri An Intelligent Approach to Educational Data: Performance Comparison of the Multilayer Perceptron and the Radial Basis Function Artificial Neural Networks , 2015 .

[9]  U. Grömping Dependence of Variable Importance in Random Forests on the Shape of the Regressor Space , 2009 .

[10]  Mohammed J. Zaki Data Mining and Analysis: Fundamental Concepts and Algorithms , 2014 .

[11]  J. Pereira,et al.  Modeling spatial patterns of fire occurrence in Mediterranean Europe using Multiple Regression and Random Forest , 2012 .

[12]  Ramón Díaz-Uriarte,et al.  Gene selection and classification of microarray data using random forest , 2006, BMC Bioinformatics.

[13]  G. Hommel,et al.  Linear regression analysis: part 14 of a series on evaluation of scientific publications. , 2010, Deutsches Arzteblatt international.