Deep learning with robustness to missing data: A novel approach to the detection of COVID-19

In the context of the current global pandemic and the limitations of the RT-PCR test, we propose a novel deep learning architecture, DFCN (Denoising Fully Connected Network). Since medical facilities around the world differ enormously in what laboratory tests or chest imaging may be available, DFCN is designed to be robust to missing input data. An ablation study extensively evaluates the performance benefits of the DFCN as well as its robustness to missing inputs. Data from 1088 patients with confirmed RT-PCR results are obtained from two independent medical facilities. The data includes results from 27 laboratory tests and a chest x-ray scored by a deep learning model. Training and test datasets are taken from different medical facilities. Data is made publicly available. The performance of DFCN in predicting the RT-PCR result is compared with 3 related architectures as well as a Random Forest baseline. All models are trained with varying levels of masked input data to encourage robustness to missing inputs. Missing data is simulated at test time by masking inputs randomly. DFCN outperforms all other models with statistical significance using random subsets of input data with 2-27 available inputs. When all 28 inputs are available DFCN obtains an AUC of 0.924, higher than any other model. Furthermore, with clinically meaningful subsets of parameters consisting of just 6 and 7 inputs respectively, DFCN achieves higher AUCs than any other model, with values of 0.909 and 0.919.

[1]  Evgeny Putin,et al.  Deep biomarkers of human aging: Application of deep neural networks to biomarker development , 2016, Aging.

[2]  Riyad Alshammari,et al.  Collaborative Denoising Autoencoder for High Glycated Haemoglobin Prediction , 2019, ICANN.

[3]  Kiyotoshi Matsuoka,et al.  Noise injection into inputs in back-propagation learning , 1992, IEEE Trans. Syst. Man Cybern..

[4]  Mario Plebani,et al.  Laboratory abnormalities in patients with COVID-2019 infection , 2020, Clinical chemistry and laboratory medicine.

[5]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[6]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[7]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[8]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  L. Celi,et al.  Machine learning can accurately predict pre-admission baseline hemoglobin and creatinine in intensive care patients , 2019, npj Digital Medicine.

[10]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  M. Chung,et al.  Portable chest X-ray in coronavirus disease-19 (COVID-19): A pictorial review , 2020, Clinical Imaging.

[12]  Kaspar Riesen,et al.  A comparative study of pattern recognition algorithms for predicting the inpatient mortality risk using routine laboratory measurements , 2019, Artificial Intelligence Review.

[13]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[14]  Francesco Sardanelli,et al.  Diagnostic Performance of Chest X-Ray for COVID-19 Pneumonia During the SARS-CoV-2 Pandemic in Lombardy, Italy , 2020, Journal of thoracic imaging.

[15]  Keun Ho Ryu,et al.  Deep Autoencoder Based Neural Networks for Coronary Heart Disease Risk Prediction , 2019, Poly/DMAH@VLDB.

[16]  Leslie N. Smith,et al.  Cyclical Learning Rates for Training Neural Networks , 2015, 2017 IEEE Winter Conference on Applications of Computer Vision (WACV).

[17]  Roberto Maroldi,et al.  COVID-19 outbreak in Italy: experimental chest X-ray scoring system for quantifying and monitoring disease progression , 2020, La radiologia medica.

[18]  R. Lu,et al.  Detection of SARS-CoV-2 in Different Types of Clinical Specimens. , 2020, JAMA.

[19]  A. Stephen McGough,et al.  Stacked Denoising Autoencoders for Mortality Risk Prediction Using Imbalanced Clinical Data , 2018, 2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA).

[20]  Jingjing Zhang,et al.  DAEimp: Denoising Autoencoder-Based Imputation of Sleep Heart Health Study for Identification of Cardiovascular Diseases , 2019, PRCV.

[21]  Mario Plebani,et al.  Hematologic, biochemical and immune biomarker abnormalities associated with severe illness and mortality in coronavirus disease 2019 (COVID-19): a meta-analysis , 2020, Clinical chemistry and laboratory medicine.

[22]  M. Kramer Nonlinear principal component analysis using autoassociative neural networks , 1991 .

[23]  B. Goldstein,et al.  Moving beyond regression techniques in cardiovascular risk prediction: applying machine learning to address analytic challenges , 2016, European heart journal.

[24]  I. Hung,et al.  Frequency and Distribution of Chest Radiographic Findings in Patients Positive for COVID-19 , 2020 .

[25]  L. Sconfienza,et al.  Chest Radiograph Findings in Asymptomatic and Minimally Symptomatic Quarantined Patients in Codogno, Italy during COVID-19 Pandemic , 2020, Radiology.

[26]  M. Kuo,et al.  Frequency and Distribution of Chest Radiographic Findings in COVID-19 Positive Patients , 2019, Radiology.

[27]  Mario Plebani,et al.  Potential preanalytical and analytical vulnerabilities in the laboratory diagnosis of coronavirus disease 2019 (COVID-19) , 2020, Clinical chemistry and laboratory medicine.

[28]  Joseph G Ibrahim,et al.  Missing data in clinical studies: issues and methods. , 2012, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[29]  E. Göttgens,et al.  Rapid identification of SARS-CoV-2-infected patients at the emergency department using routine testing , 2020, medRxiv.

[30]  Lorenzo L. Pesce,et al.  Noise injection for training artificial neural networks: a comparison with weight decay and early stopping. , 2009, Medical physics.

[31]  E. DeLong,et al.  Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. , 1988, Biometrics.

[32]  Pascal Vincent,et al.  Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion , 2010, J. Mach. Learn. Res..

[33]  Theodora Psaltopoulou,et al.  Hematological findings and complications of COVID‐19 , 2020, American journal of hematology.