Medical Image Data and Datasets in the Era of Machine Learning—Whitepaper from the 2016 C-MIMI Meeting Dataset Session

At the first annual Conference on Machine Intelligence in Medical Imaging (C-MIMI), held in September 2016, a conference session on medical image data and datasets for machine learning identified multiple issues. The common theme from attendees was that everyone participating in medical image evaluation with machine learning is data starved. There is an urgent need to find better ways to collect, annotate, and reuse medical imaging data. Unique domain issues with medical image datasets require further study, development, and dissemination of best practices and standards, and a coordinated effort among medical imaging domain experts, medical imaging informaticists, government and industry data scientists, and interested commercial, academic, and government entities. High-level attributes of reusable medical image datasets suitable to train, test, validate, verify, and regulate ML products should be better described. NIH and other government agencies should promote and, where applicable, enforce, access to medical image datasets. We should improve communication among medical imaging domain experts, medical imaging informaticists, academic clinical and basic science researchers, government and industry data scientists, and interested commercial entities.

[1]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[2]  Lorenzo Bonomo,et al.  Missed lung cancer: when, where, and why? , 2017, Diagnostic and interventional radiology.

[3]  A. Brady Error and discrepancy in radiology: inevitable or avoidable? , 2016, Insights into Imaging.

[4]  Qing Zeng-Treitler,et al.  Predicting sample size required for classification performance , 2012, BMC Medical Informatics and Decision Making.

[5]  Michael Blum,et al.  High-Throughput Classification of Radiographs Using Deep Convolutional Neural Networks , 2016, Journal of Digital Imaging.

[6]  A. Wilmer,et al.  Comparison of premortem clinical diagnoses in critically iII patients and subsequent autopsy findings. , 2000, Mayo Clinic proceedings.

[7]  W. E. Miller,et al.  Lung cancer detected during a screening program using four-month chest radiographs. , 1983, Radiology.

[8]  David A. Landgrebe,et al.  Predicting the Required Number of Training Samples , 1983, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Michael Kohnen,et al.  Quality of DICOM header information for image categorization , 2002, SPIE Medical Imaging.

[10]  Berkman Sahiner,et al.  Evaluation of computer-aided detection and diagnosis systems. , 2013, Medical physics.

[11]  Ronald M. Summers,et al.  Deep Learning in Medical Imaging: Overview and Future Promise of an Exciting New Technique , 2016 .

[12]  Rafael Bello,et al.  A New Measure Based in the Rough Set Theory to Estimate the Training Set Quality , 2006, 2006 Eighth International Symposium on Symbolic and Numeric Algorithms for Scientific Computing.

[13]  Erik Schultes,et al.  The FAIR Guiding Principles for scientific data management and stewardship , 2016, Scientific Data.

[14]  Rita Noumeir,et al.  Benefits of the DICOM Structured Report , 2006, Journal of Digital Imaging.

[15]  H. Marquering,et al.  Diagnostic Accuracy of 4 Commercially Available Semiautomatic Packages for Carotid Artery Stenosis Measurement on CTA , 2015, American Journal of Neuroradiology.

[16]  P. Robinson,et al.  Radiology's Achilles' heel: error and variation in the interpretation of the Röntgen image. , 1997, The British journal of radiology.

[17]  R. Fitzgerald Error in radiology. , 2001, Clinical radiology.

[18]  Yoshua Bengio,et al.  How transferable are features in deep neural networks? , 2014, NIPS.

[19]  J. V. van Engelshoven,et al.  Miss rate of lung cancer on the chest radiograph in clinical practice. , 1999, Chest.

[20]  Daniel L. Rubin,et al.  The National Cancer Informatics Program (NCIP) Annotation and Image Markup (AIM) Foundation Model , 2014, Journal of Digital Imaging.

[21]  J. Buckner,et al.  MRI texture features as biomarkers to predict MGMT methylation status in glioblastomas , 2016, Medical physics.

[22]  Leonard Berlin,et al.  Radiologic errors and malpractice: a blurry distinction. , 2007, AJR. American journal of roentgenology.

[23]  Paul Chang,et al.  Mapping Institution-Specific Study Descriptions to RadLex Playbook Entries , 2013, Journal of Digital Imaging.

[24]  Sayan Mukherjee,et al.  Estimating Dataset Size Requirements for Classifying DNA Microarray Data , 2003, J. Comput. Biol..

[25]  Evrim B Turkbey,et al.  Comparative Evaluation of Three Software Packages for Liver and Spleen Segmentation and Volumetry. , 2017, Academic radiology.

[26]  Stefan L. Zimmerman,et al.  Informatics in radiology: automated structured reporting of imaging findings using the AIM standard and XML. , 2011, Radiographics : a review publication of the Radiological Society of North America, Inc.

[27]  Khan M Siddiqui,et al.  The relative effect of vendor variability in CT perfusion results: a method comparison study. , 2011, AJR. American journal of roentgenology.