Making sense of large data sets without annotations: analyzing age-related correlations from lung CT scans

The analysis of large data sets can help to gain knowledge about specific organs or on specific diseases, just as big data analysis does in many non-medical areas. This article aims to gain information from 3D volumes, so the visual content of lung CT scans of a large number of patients. In the case of the described data set, only little annotation is available on the patients that were all part of an ongoing screening program and besides age and gender no information on the patient and the findings was available for this work. This is a scenario that can happen regularly as image data sets are produced and become available in increasingly large quantities but manual annotations are often not available and also clinical data such as text reports are often harder to share. We extracted a set of visual features from 12,414 CT scans of 9,348 patients that had CT scans of the lung taken in the context of a national lung screening program in Belarus. Lung fields were segmented by two segmentation algorithms and only cases where both algorithms were able to find left and right lung and had a Dice coefficient above 0.95 were analyzed. This assures that only segmentations of good quality were used to extract features of the lung. Patients ranged in age from 0 to 106 years. Data analysis shows that age can be predicted with a fairly high accuracy for persons under 15 years. Relatively good results were also obtained between 30 and 65 years where a steady trend is seen. For young adults and older people the results are not as good as variability is very high in these groups. Several visualizations of the data show the evolution patters of the lung texture, size and density with age. The experiments allow learning the evolution of the lung and the gained results show that even with limited metadata we can extract interesting information from large-scale visual data. These age-related changes (for example of the lung volume, the density histogram of the tissue) can also be taken into account for the interpretation of new cases. The database used includes patients that had suspicions on a chest X-ray, so it is not a group of healthy people, and only tendencies and not a model of a healthy lung at a specific age can be derived.

[1]  Antoine Geissbühler,et al.  Building a reference multimedia database for interstitial lung diseases , 2012, Comput. Medical Imaging Graph..

[2]  Luc Van Gool,et al.  Deep Retinal Image Understanding , 2016, MICCAI.

[3]  Lena Maier-Hein,et al.  Crowdsourcing for Reference Correspondence Generation in Endoscopic Images , 2014, MICCAI.

[4]  Yaozong Gao,et al.  Landmark-Based Alzheimer's Disease Diagnosis Using Longitudinal Structural MR Images , 2016, MCV/BAMBI@MICCAI.

[5]  Berkman Sahiner,et al.  Lung nodule detection on thoracic computed tomography images: preliminary evaluation of a computer-aided diagnosis system. , 2002, Medical physics.

[6]  Dimitri Van De Ville,et al.  Near-Affine-Invariant Texture Learning for Lung Tissue Analysis Using Isotropic Wavelet Frames , 2012, IEEE Transactions on Information Technology in Biomedicine.

[7]  Henning Müller,et al.  Ground truth generation in medical imaging: a crowdsourcing-based iterative approach , 2012, CrowdMM '12.

[8]  Zaid J. Towfic,et al.  The Lung Image Database Consortium (LIDC) data collection process for nodule detection and annotation , 2007, SPIE Medical Imaging.

[9]  Yuankai Huo,et al.  Mapping Lifetime Brain Volumetry with Covariate-Adjusted Restricted Cubic Spline Regression from Cross-Sectional Multi-site MRI , 2016, MICCAI.

[10]  Joel H. Saltz,et al.  Research and applications: Cancer Digital Slide Archive: an informatics resource to support integrated in silico analysis of TCGA pathology data , 2013, J. Am. Medical Informatics Assoc..

[11]  Orcun Goksel,et al.  Overview of the VISCERAL Challenge at ISBI 2015 , 2015, VISCERAL Challenge@ISBI.

[12]  E. V. van Beek,et al.  Computer-aided classification of interstitial lung diseases via MDCT: 3D adaptive multiple feature method (3D AMFM). , 2006, Academic radiology.

[13]  D. Belsky,et al.  Quantification of biological aging in young adults , 2015, Proceedings of the National Academy of Sciences.

[14]  A. Kak,et al.  Automated storage and retrieval of thin-section CT images to assist diagnosis: system description and preliminary assessment. , 2003, Radiology.

[15]  Tobias Gass,et al.  Cloud-Based Evaluation of Anatomical Structure Segmentation and Landmark Detection Algorithms: VISCERAL Anatomy Benchmarks , 2016, IEEE Transactions on Medical Imaging.

[16]  Allan Hanbury,et al.  Creating a Large-Scale Silver Corpus from Multiple Algorithmic Segmentations , 2015, MCV@MICCAI.

[17]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[18]  Brian B. Avants,et al.  The Multimodal Brain Tumor Image Segmentation Benchmark (BRATS) , 2015, IEEE Transactions on Medical Imaging.

[19]  Robert M. Haralick,et al.  Textural Features for Image Classification , 1973, IEEE Trans. Syst. Man Cybern..

[20]  Vincent Lepetit,et al.  Automated Age Estimation from Hand MRI Volumes Using Deep Learning , 2016, MICCAI.

[21]  Henning Müller,et al.  Efficient and fully automatic segmentation of the lungs in CT volumes , 2015, VISCERAL Challenge@ISBI.

[22]  T. Peto,et al.  Crowdsourcing as a Novel Technique for Retinal Fundus Photography Classification: Analysis of Images in the EPIC Norfolk Cohort on Behalf of the UKBiobank Eye and Vision Consortium , 2013, PloS one.

[23]  Martin Styner,et al.  Comparison and Evaluation of Methods for Liver Segmentation From CT Datasets , 2009, IEEE Transactions on Medical Imaging.