Highdicom: A Python library for standardized encoding of image annotations and machine learning model outputs in pathology and radiology

Machine learning (ML) is revolutionizing image-based diagnostics in pathology and radiology. ML models have shown promising results in research settings, but the lack of interoperability between ML systems and enterprise medical imaging systems has been a major barrier for clinical integration and evaluation. The DICOM® standard specifies Information Object Definitions (IODs) and Services for the representation and communication of digital images and related information, including image-derived annotations and analysis results. However, the complexity of the standard represents an obstacle for its adoption in the ML community and creates a need for software libraries and tools that simplify working with data sets in DICOM format. Here we present the highdicom library, which provides a high-level application programming interface (API) for the Python programming language that abstracts low-level details of the standard and enables encoding and decoding of image-derived information in DICOM format in a few lines of Python code. The highdicom library leverages NumPy arrays for efficient data representation and ties into the extensive Python ecosystem for image processing and machine learning. Simultaneously, by simplifying creation and parsing of DICOM-compliant files, highdicom achieves interoperability with the medical imaging systems that hold the data used to train and run ML models, and ultimately communicate and store model outputs for clinical use. We demonstrate through experiments with slide microscopy and computed tomography imaging, that, by bridging these two ecosystems, highdicom enables developers and researchers to train and evaluate state-of-the-art ML models in pathology and radiology while remaining compliant with the DICOM standard and interoperable with clinical systems at all stages. To promote standardization of ML research and streamline the ML model development and deployment process, we made the library available free and open-source at https://github.c om/mghcomputationalpathology/highdicom. 1 ar X iv :2 10 6. 07 80 6v 2 [ ee ss .I V ] 9 S ep 2 02 1

[1]  Ming Y. Lu,et al.  Data-efficient and weakly supervised computational pathology on whole-slide images , 2020, Nature Biomedical Engineering.

[2]  Joel Nothman,et al.  SciPy 1.0-Fundamental Algorithms for Scientific Computing in Python , 2019, ArXiv.

[3]  David A Clunie,et al.  DICOM Structured Reporting and Cancer Clinical Trials Results , 2007, Cancer informatics.

[4]  Michele Larobina,et al.  Medical Image File Formats , 2014, Journal of Digital Imaging.

[5]  William J. R. Longabaugh,et al.  NCI Imaging Data Commons , 2021, Cancer Research.

[6]  Nikos Paragios,et al.  Weakly supervised multiple instance learning histopathological tumor segmentation , 2020, MICCAI.

[7]  W D Bidgood,et al.  The SNOMED DICOM microglossary: controlled terminology resource for data interchange in biomedical imaging. , 1998, Methods of information in medicine.

[8]  Gabor Fichtinger,et al.  dcmqi: An Open Source Library for Standardized Communication of Quantitative Image Analysis Results Using DICOM. , 2017, Cancer research.

[9]  Richard C. Pais,et al.  The Lung Image Database Consortium (LIDC) and Image Database Resource Initiative (IDRI): a completed reference database of lung nodules on CT scans. , 2011, Medical physics.

[10]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[11]  Danica Marinac-Dabic,et al.  A Road Map for Translational Research on Artificial Intelligence in Medical Imaging: From the 2018 National Institutes of Health/RSNA/ACR/The Academy Workshop. , 2019, Journal of the American College of Radiology : JACR.

[12]  N. Razavian,et al.  Classification and mutation prediction from non–small cell lung cancer histopathology images using deep learning , 2018, Nature Medicine.

[13]  Liron Pantanowitz,et al.  Digital Imaging and Communications in Medicine Whole Slide Imaging Connectathon at Digital Pathology Association Pathology Visions 2017 , 2018, Journal of pathology informatics.

[14]  Milan Sonka,et al.  3D Slicer as an image computing platform for the Quantitative Imaging Network. , 2012, Magnetic resonance imaging.

[15]  Nicholas C. Jones,et al.  Integrating the Health-care Enterprise Pathology and Laboratory Medicine Guideline for Digital Pathology Interoperability , 2021, Journal of pathology informatics.

[16]  Erik Schultes,et al.  The FAIR Guiding Principles for scientific data management and stewardship , 2016, Scientific Data.

[17]  Oleg S. Pianykh,et al.  Current Applications and Future Impact of Machine Learning in Radiology. , 2018, Radiology.

[18]  D Mason,et al.  SU-E-T-33: Pydicom: An Open Source DICOM Library , 2011 .

[19]  Ross B. Girshick,et al.  Focal Loss for Dense Object Detection , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Scott R. Granter,et al.  AlphaGo, Deep Learning, and the Future of the Human Microscopist. , 2017, Archives of pathology & laboratory medicine.

[21]  G. Corrado,et al.  End-to-end lung cancer screening with three-dimensional deep learning on low-dose chest computed tomography , 2019, Nature Medicine.

[22]  清也 稲邑,et al.  DICOM Structured Reporting構造化報告書 , 2001 .

[23]  David A. Clunie,et al.  Dual-Personality DICOM-TIFF for Whole Slide Images: A Migration Technique for Legacy Software , 2019, Journal of pathology informatics.

[24]  Christopher Rorden,et al.  The first step for neuroimaging data analysis: DICOM to NIfTI conversion , 2016, Journal of Neuroscience Methods.

[25]  David S. Melnick,et al.  International evaluation of an AI system for breast cancer screening , 2020, Nature.

[26]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[27]  Lina Chen,et al.  Using DICOM Metadata for Radiological Image Series Categorization: a Feasibility Study on Large Clinical Brain MRI Datasets , 2020, Journal of Digital Imaging.

[28]  Ron Kikinis,et al.  Implementing the DICOM Standard for Digital Pathology , 2018, Journal of pathology informatics.

[29]  Andreas Wahle,et al.  DICOM for quantitative imaging biomarker development: a standards based approach to sharing clinical data and structured PET/CT analysis results in head and neck cancer research , 2016, PeerJ.

[30]  Mathias Brochhausen,et al.  DICOM re‐encoding of volumetrically annotated Lung Imaging Database Consortium (LIDC) nodules , 2020, Medical physics.

[31]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[32]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  G. Litjens,et al.  Deep learning in histopathology: the path to the clinic , 2021, Nature Medicine.

[34]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[35]  K. Jarrod Millman,et al.  Array programming with NumPy , 2020, Nat..

[36]  Ming Y. Lu,et al.  AI-based pathology predicts origins for cancers of unknown primary , 2020, Nature.

[37]  Thomas J. Fuchs,et al.  Clinical-grade computational pathology using weakly supervised deep learning on whole slide images , 2019, Nature Medicine.

[38]  D. Peck Digital Imaging and Communications in Medicine (DICOM): A Practical Introduction and Survival Guide , 2009, Journal of Nuclear Medicine.

[39]  S. Davies,et al.  Artificial Intelligence in Global Health , 2019, Ethics & International Affairs.

[40]  Christian Etmann,et al.  Common pitfalls and recommendations for using machine learning to detect and prognosticate for COVID-19 using chest radiographs and CT scans , 2020 .

[41]  Christopher J. Roth,et al.  A Foundation for Enterprise Imaging: HIMSS-SIIM Collaborative White Paper , 2016, Journal of Digital Imaging.

[42]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[43]  Stephen M. Moore,et al.  The Cancer Imaging Archive (TCIA): Maintaining and Operating a Public Information Repository , 2013, Journal of Digital Imaging.

[44]  Emmanuelle Gouillart,et al.  scikit-image: image processing in Python , 2014, PeerJ.