Huge amounts of electronic health records (EHRs) accumulated in recent years have provided a rich foundation for disease risk prediction. However, the challenging problems of incompletion in raw data and interpretability of prediction model are not solved very well so far. In this study, we present a mimic learning approach for disease risk prediction with large ratio of missing values, called SR-DF, as one of the early attempts. Specifically, we adopt spectral regularization for incomplete medical data learning, on which the missingness among raw data can be more accurately measured and imputed. Moreover, by utilizing deep forest, we get an effective method that takes advantages of interpretable and reliable model for disease risk prediction, which requires far fewer parameters and is less sensitive to parameter settings. As we will report in the experiments, the proposed method outperforms the baselines and achieves relatively consistent and stable results.
[1]
Lina Yao,et al.
EEG-based Motion Intention Recognition via Multi-task RNNs
,
2018,
SDM.
[2]
Lina Yao,et al.
Dynamic Illness Severity Prediction via Multi-task RNNs for Intensive Care Unit
,
2018,
2018 IEEE International Conference on Data Mining (ICDM).
[3]
Ji Feng,et al.
Deep Forest: Towards An Alternative to Deep Neural Networks
,
2017,
IJCAI.
[4]
Robert Tibshirani,et al.
Spectral Regularization Algorithms for Learning Large Incomplete Matrices
,
2010,
J. Mach. Learn. Res..
[5]
Yan Liu,et al.
Interpretable Deep Models for ICU Outcome Prediction
,
2016,
AMIA.
[6]
Yan Liu,et al.
Distilling Knowledge from Deep Networks with Applications to Healthcare Domain
,
2015,
ArXiv.