In the context of the rapid development of integrated energy and the digital transformation of power grids, data is playing an increasingly important role in the safe operation of power grids. To deepen the value of data application and ensure the accuracy of data application, this paper proposes a data filling method that combines linear interpolation and LightGBM (Light Gradient Boosting Machine) in response to the missing phenomenon in the source network data collection process. The process can generally be divided into 2 steps: First, linear interpolation is exploited to process short-term missing data. Then the LightGBM can be used to process long-term missing data. In the process of using LightGBM, linear interpolation is used to interpolate the independent variables of the input model. Through the above process, the data for the missing ratio could be obtained, which can then be used to complete all data filling in order from high to low. Through actual data test, this method has better data filling performance.
[1]
J. Carpenter,et al.
Practice of Epidemiology Comparison of Random Forest and Parametric Imputation Models for Imputing Missing Data Using MICE: A CALIBER Study
,
2014
.
[2]
J L Schafer,et al.
Multiple Imputation for Multivariate Missing-Data Problems: A Data Analyst's Perspective.
,
1998,
Multivariate behavioral research.
[3]
Sen-Sung Chen,et al.
Imputation of missing values using quantile regression
,
2014
.
[4]
Azami Zaharim,et al.
Application of the Single Imputation Method to Estimate Missing Wind Speed Data in Malaysia
,
2013
.
[5]
Tie-Yan Liu,et al.
LightGBM: A Highly Efficient Gradient Boosting Decision Tree
,
2017,
NIPS.
[6]
D. Rubin,et al.
Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper
,
1977
.