Load Forecasting of Power SCADA Based on Spark MLlib

In order to improve the accuracy and speed of power forecasting in power SCADA system, a distributed real-time steaming forecasting model is designed based on K-means algorithm and Random Forest algorithm in the Spark machine learning library (MLlib). The model uses the sliding window mechanism to segment the incoming data stream. K-means Clustering is used to correct the abnormally data, and then the Random Forest Regression forecasting is performed. Model algorithms is implemented based on the Spark RDD, the performance of the algorithm is verified by sending the data through the daemon process which is a simulation of the message queue. The results show that the forecasting accuracy of the algorithm is superior to the traditional serial Random Forest forecasting and satisfies the real-time requirement. Keywords-component; spark; decision tree; random forest; kmenas