Estimation of Missing Values Using a Weighted K-Nearest Neighbors Algorithm

This paper developed a novel method to estimate the values of missing data by the use of a weighted -nearest neighbors algorithm. A weighting scheme that exploits the correlation between a “missing” dimension and available data values from other fields, which is quantified based on the support vector regression method. The proposed method has been applied to a practical case of modeling steel corrosion. Comparing with the traditional imputation algorithm, the model results demonstrate its better generalization capability.