Missing Data Imputation for Categorical Data Based on Random Forest Model

Missing data is a important factor which has bad effect on the data quality of survey questionnaire,missing data imputation can significantly improve the data quality.Categorical data is the main data type of survey data.Classification algorithms of data mining can be often dealt with classification problem,random forest modeling is one of the high predictive accuracy classification models.This paper introduces the random forest model into the missing data imputation research of survey data,and proposes the missing data imputation method for categorical data based on random forest model.Imputation process is also designed according to different pattern of missing data.Empirical simulation shows that the proposed new method can obtain more accuracy and reliable results by comparing with other imputation methods.