Deep multi-path network integrating incomplete biomarker and chest CT data for evaluating lung cancer risk

Clinical data elements (CDEs) (e.g., age, smoking history), blood markers and chest computed tomography (CT) structural features have been regarded as effective means for assessing lung cancer risk. These independent variables can provide complementary information and we hypothesize that combining them will improve the prediction accuracy. In practice, not all patients have all these variables available. In this paper, we propose a new network design, termed as multi-path multi-modal missing network (M3Net), to integrate the multi-modal data (i.e., CDEs, biomarker and CT image) considering missing modality with multiple paths neural network. Each path learns discriminative features of one modality, and different modalities are fused in a second stage for an integrated prediction. The network can be trained end-to-end with both medical image features and CDEs/biomarkers, or make a prediction with single modality. We evaluate M3Net with datasets including three sites from the Consortium for Molecular and Cellular Characterization of Screen-Detected Lesions (MCL) project. Our method is cross validated within a cohort of 1291 subjects (383 subjects with complete CDEs/biomarkers and CT images), and externally validated with a cohort of 99 subjects (99 with complete CDEs/biomarkers and CT images). Both cross-validation and external-validation results show that combining multiple modality significantly improves the predicting performance of single modality. The results suggest that integrating subjects with missing either CDEs/biomarker or CT imaging features can contribute to the discriminatory power of our model (p < 0.05, bootstrap two-tailed test). In summary, the proposed M3Net framework provides an effective way to integrate image and non-image data in the context of missing information.

[1]  Peter Bühlmann,et al.  MissForest - non-parametric missing value imputation for mixed-type data , 2011, Bioinform..

[2]  Yuankai Huo,et al.  Internal-transfer Weighting of Multi-task Learning for Lung Cancer Detection , 2020, Medical Imaging: Image Processing.

[3]  S. Swensen,et al.  The probability of malignancy in solitary pulmonary nodules. Application to small radiologically indeterminate nodules. , 1997, Archives of internal medicine.

[4]  Bo Jiang,et al.  MisGAN: Learning from Incomplete Data with Generative Adversarial Networks , 2019, ICLR.

[5]  Pablo M. Olmos,et al.  Handling Incomplete Heterogeneous Data using VAEs , 2018, Pattern Recognit..

[6]  Shunxing Bao,et al.  Multi-path x-D recurrent neural networks for collaborative image classification , 2020, Neurocomputing.

[7]  Lori Stewart,et al.  Prediction of lung cancer risk at follow-up screening with low-dose CT: a training and validation study of a deep learning method. , 2019, The Lancet. Digital health.

[8]  D. Lynch,et al.  The National Lung Screening Trial: overview and study design. , 2011, Radiology.

[9]  A. Jemal,et al.  Cancer statistics, 2019 , 2019, CA: a cancer journal for clinicians.

[10]  Stef van Buuren,et al.  Flexible Imputation of Missing Data , 2012 .

[11]  D. Rubin INFERENCE AND MISSING DATA , 1975 .

[12]  Shunxing Bao,et al.  Time-distanced gates in long short-term memory networks , 2020, Medical Image Anal..

[13]  Zhe Li,et al.  Evaluate the Malignancy of Pulmonary Nodules Using the 3-D Deep Leaky Noisy-OR Network , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[14]  Constantine Frangakis,et al.  Multiple imputation by chained equations: what is it and how does it work? , 2011, International journal of methods in psychiatric research.

[15]  Mihaela van der Schaar,et al.  GAIN: Missing Data Imputation using Generative Adversarial Nets , 2018, ICML.

[16]  S. Cummings,et al.  Estimating the probability of malignancy in solitary pulmonary nodules. A Bayesian approach. , 1986, The American review of respiratory disease.

[17]  Max Welling,et al.  Attention-based Deep Multiple Instance Learning , 2018, ICML.

[18]  Yuankai Huo,et al.  Deep Multi-task Prediction of Lung Cancer and Cancer-free Progression from Censored Heterogenous Clinical Imaging , 2020, Medical Imaging: Image Processing.

[19]  Yan Shen,et al.  Brain Tumor Segmentation on MRI with Missing Modalities , 2019, IPMI.

[20]  P. Massion,et al.  Compensated Interferometry Measures of CYFRA 21-1 Improve Diagnosis of Lung Cancer. , 2019, ACS combinatorial science.

[21]  Shunxing Bao,et al.  Distanced LSTM: Time-Distanced Gates in Long Short-Term Memory Models for Lung Cancer Detection , 2019, MLMI@MICCAI.

[22]  C. Gatsonis,et al.  Reduced Lung-Cancer Mortality with Low-Dose Computed Tomographic Screening , 2012 .