A De-Identification Pipeline for Ultrasound Medical Images in DICOM Format

Clinical data sharing between healthcare institutions, and between practitioners is often hindered by privacy protection requirements. This problem is critical in collaborative scenarios where data sharing is fundamental for establishing a workflow among parties. The anonymization of patient information burned in DICOM images requires elaborate processes somewhat more complex than simple de-identification of textual information. Usually, before sharing, there is a need for manual removal of specific areas containing sensitive information in the images. In this paper, we present a pipeline for ultrasound medical image de-identification, provided as a free anonymization REST service for medical image applications, and a Software-as-a-Service to streamline automatic de-identification of medical images, which is freely available for end-users. The proposed approach applies image processing functions and machine-learning models to bring about an automatic system to anonymize medical images. To perform character recognition, we evaluated several machine-learning models, being Convolutional Neural Networks (CNN) selected as the best approach. For accessing the system quality, 500 processed images were manually inspected showing an anonymization rate of 89.2%. The tool can be accessed at https://bioinformatics.ua.pt/dicom/anonymizer and it is available with the most recent version of Google Chrome, Mozilla Firefox and Safari. A Docker image containing the proposed service is also publicly available for the community.

[1]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[2]  M. W Gardner,et al.  Artificial neural networks (the multilayer perceptron)—a review of applications in the atmospheric sciences , 1998 .

[3]  Michael I. Jordan,et al.  Learning with Mixtures of Trees , 2001, J. Mach. Learn. Res..

[4]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[5]  Jean-Marc Odobez,et al.  Text detection, recognition in images and video frames , 2004, Pattern Recognit..

[6]  H. K. Huang,et al.  PACS and Imaging Informatics: Basic Principles and Applications , 2004 .

[7]  ANTONIN CHAMBOLLE,et al.  An Algorithm for Total Variation Minimization and Applications , 2004, Journal of Mathematical Imaging and Vision.

[8]  A. Chambolle Practical, Unified, Motion and Missing Data Treatment in Degraded Video , 2004, Journal of Mathematical Imaging and Vision.

[9]  P. King PACS and Imaging Informatics; Basic Principles and Applications - [Book Review] , 2005, IEEE Engineering in Medicine and Biology Magazine.

[10]  Min Wu,et al.  Public Health Data Collection and Sharing Using HIPAA Messages , 2005, Journal of Medical Systems.

[11]  S. Darmoni,et al.  Modality Categorization by Textual Annotations Interpretation in Medical Imaging , 2005 .

[12]  P. Shekelle,et al.  Systematic Review: Impact of Health Information Technology on Quality, Efficiency, and Costs of Medical Care , 2006, Annals of Internal Medicine.

[13]  Yoshua. Bengio,et al.  Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[14]  Gary R. Bradski,et al.  Learning OpenCV - computer vision with the OpenCV library: software that sees , 2008 .

[15]  Tijmen Tieleman,et al.  Training restricted Boltzmann machines using approximations to the likelihood gradient , 2008, ICML '08.

[16]  Oleg S. Pianykh,et al.  Digital Imaging and Communications in Medicine : A Practical Introduction and Survival Guide , 2008 .

[17]  Manik Varma,et al.  Character Recognition in Natural Images , 2009, VISAPP.

[18]  Chia-Hung Hsiao,et al.  Privacy preservation and information security protection for patients' portable electronic health records , 2009, Comput. Biol. Medicine.

[19]  José Luís Oliveira,et al.  Dicoogle - an Open Source Peer-to-Peer PACS , 2011, Journal of Digital Imaging.

[20]  J. Wardlaw,et al.  An open source toolkit for medical imaging de-identification , 2010, European Radiology.

[21]  Chih-Jen Lin,et al.  Dual coordinate descent methods for logistic regression and maximum entropy models , 2011, Machine Learning.

[22]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[23]  F. Tessler Protected Health Information on Ultrasound Images , 2011, Journal of ultrasound in medicine : official journal of the American Institute of Ultrasound in Medicine.

[24]  David A. Clunie,et al.  Image Data Sharing for Biomedical Research—Meeting HIPAA Requirements for De-identification , 2012, Journal of Digital Imaging.

[25]  James Z. Wang,et al.  DDIT - A Tool for DICOM Brain Images De-Identification , 2011, 2011 5th International Conference on Bioinformatics and Biomedical Engineering.

[26]  Razvan Pascanu,et al.  Learning Algorithms for the Classification Restricted Boltzmann Machine , 2012, J. Mach. Learn. Res..

[27]  Razvan Pascanu,et al.  Theano: Deep Learning on GPUs with Python , 2012 .

[28]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[29]  Oleg S. Pianykh,et al.  Digital Imaging and Communications in Medicine (DICOM) , 2017, Radiopaedia.org.

[30]  Zahid Anwar,et al.  SOAD: Securing Oncology EMR by Anonymizing DICOM Images , 2013, 2013 11th International Conference on Frontiers of Information Technology.

[31]  José Luís Oliveira,et al.  A DICOM viewer based on web technology , 2013, 2013 IEEE 15th International Conference on e-Health Networking, Applications and Services (Healthcom 2013).

[32]  S. Shamshuddin,et al.  Use of OsiriX in developing a digital radiology teaching library. , 2014, Clinical radiology.

[33]  Rui Zhang,et al.  Anonymization of DICOM electronic medical records for radiation therapy , 2014, Comput. Biol. Medicine.

[34]  Emmanuelle Gouillart,et al.  scikit-image: image processing in Python , 2014, PeerJ.

[35]  Ricardo Sánchez-de-Madariaga,et al.  Service for the Pseudonymization of Electronic Healthcare Records Based on ISO/EN 13606 for the Secondary Use of Information , 2015, IEEE Journal of Biomedical and Health Informatics.

[36]  Alistair A. Young,et al.  Big Heart Data: Advancing Health Informatics Through Data Sharing in Cardiovascular Imaging , 2015, IEEE Journal of Biomedical and Health Informatics.

[37]  David S. Doermann,et al.  Text Detection and Recognition in Imagery: A Survey , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[38]  Matthijs Oudkerk,et al.  Image De-Identification Methods for Clinical Research in the XDS Environment , 2016, Journal of Medical Systems.

[39]  Ilias Maglogiannis,et al.  Sensitive Patient Data Hiding using a ROI Reversible Steganography Scheme for DICOM Images , 2016, Journal of Medical Systems.