A Mimic Learning Method for Disease Risk Prediction with Incomplete Initial Data

Huge amounts of electronic health records (EHRs) accumulated in recent years have provided a rich foundation for disease risk prediction. However, the challenging problems of incompletion in raw data and interpretability of prediction model are not solved very well so far. In this study, we present a mimic learning approach for disease risk prediction with large ratio of missing values, called SR-DF, as one of the early attempts. Specifically, we adopt spectral regularization for incomplete medical data learning, on which the missingness among raw data can be more accurately measured and imputed. Moreover, by utilizing deep forest, we get an effective method that takes advantages of interpretable and reliable model for disease risk prediction, which requires far fewer parameters and is less sensitive to parameter settings. As we will report in the experiments, the proposed method outperforms the baselines and achieves relatively consistent and stable results.