Privacy risks of whole-slide image sharing in digital pathology

Access to Whole-Slide Images has become a cornerstone of the development of AI methods in pathology, for diagnostic use and research. Authors have developed model for privacy risks analysis and propose guidelines for safe sharing of WSI data. Access to large volumes of so-called whole-slide images—high-resolution scans of complete pathological slides—has become a cornerstone of the development of novel artificial intelligence methods in pathology for diagnostic use, education/training of pathologists, and research. Nevertheless, a methodology based on risk analysis for evaluating the privacy risks associated with sharing such imaging data and applying the principle “as open as possible and as closed as necessary” is still lacking. In this article, we develop a model for privacy risk analysis for whole-slide images which focuses primarily on identity disclosure attacks, as these are the most important from a regulatory perspective. We introduce a taxonomy of whole-slide images with respect to privacy risks and mathematical model for risk assessment and design . Based on this risk assessment model and the taxonomy, we conduct a series of experiments to demonstrate the risks using real-world imaging data. Finally, we develop guidelines for risk assessment and recommendations for low-risk sharing of whole-slide image data.

[1]  G. Zanetti,et al.  Interchangeability of light and virtual microscopy for histopathological evaluation of prostate cancer , 2021, Scientific Reports.

[2]  Andreas Holzinger,et al.  Predicting prostate cancer specific-mortality with artificial intelligence-based Gleason grading , 2020, Communications Medicine.

[3]  Ellery Wulczyn,et al.  Interpretable survival prediction for colorectal cancer using deep learning , 2020, npj Digital Medicine.

[4]  E. van Enckevort,et al.  The case for open science: rare diseases. , 2020, JAMIA open.

[5]  Pierre Courtiol,et al.  A deep learning model to predict RNA-Seq expression of tumours from whole slide images , 2020, Nature Communications.

[6]  Mário S. Alvim,et al.  The Science of Quantitative Information Flow , 2020, Information Security and Cryptography.

[7]  Geoffrey E. Hinton,et al.  A Simple Framework for Contrastive Learning of Visual Representations , 2020, ICML.

[8]  N. Radakovich,et al.  Spatial heterogeneity and organization of tumor mutation burden with immune infiltrates within tumors based on whole slide images correlated with patient survival in bladder cancer , 2019, Journal of pathology informatics.

[9]  S. Park,et al.  Deep transfer learning approach to predict tumor mutation burden (TMB) and delineate spatial heterogeneity of TMB within tumors from whole slide images , 2019, bioRxiv.

[10]  Shaoqun Zeng,et al.  From Detection of Individual Metastases to Classification of Lymph Node Status at the Patient Level: The CAMELYON17 Challenge , 2019, IEEE Transactions on Medical Imaging.

[11]  Max Welling,et al.  Rotation Equivariant CNNs for Digital Pathology , 2018, MICCAI.

[12]  P. Baldi,et al.  Deep-Learning Convolutional Neural Networks Accurately Classify Genetic Mutations in Gliomas , 2018, American Journal of Neuroradiology.

[13]  Fabian Prasser,et al.  Enhancing Reuse of Data and Biological Material in Medical Research: From FAIR to FAIR-Health , 2018, Biopreservation and biobanking.

[14]  Andrew H. Beck,et al.  Diagnostic Assessment of Deep Learning Algorithms for Detection of Lymph Node Metastases in Women With Breast Cancer , 2017, JAMA.

[15]  Dipak Kalra,et al.  Sharing and reuse of individual participant data from clinical trials: principles and recommendations , 2017, BMJ Open.

[16]  Vitaly Shmatikov,et al.  Membership Inference Attacks Against Machine Learning Models , 2016, 2017 IEEE Symposium on Security and Privacy (SP).

[17]  G. Zanetti,et al.  Cy-TEST - A new platform for training and testing in cytopathology , 2016 .

[18]  Andrew J. Schaumberg,et al.  D R A F T H&E-stained Whole Slide Image Deep Learning Predicts SPOP Mutation State in Prostate Cancer , 2017 .

[19]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[21]  Joel H. Saltz,et al.  Research and applications: Cancer Digital Slide Archive: an informatics resource to support integrated in silico analysis of TCGA pathology data , 2013, J. Am. Medical Informatics Assoc..

[22]  Khaled El Emam,et al.  Guide to the De-Identification of Personal Health Information , 2013 .

[23]  H. Pass,et al.  BAP1 cancer syndrome: malignant mesothelioma, uveal and cutaneous melanoma, and MBAITs , 2012, Journal of Translational Medicine.

[24]  E. Clayton,et al.  Identifiability in biobanks: models, measures, and mitigation strategies , 2011, Human Genetics.

[25]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[26]  M. Ježová,et al.  Hypertext atlas of fetal and neonatal pathology , 2008, Diagnostic Pathology.

[27]  Josef Feit,et al.  Hypertext atlas of dermatopathology with expert system for epithelial tumors of the skin , 2005, Journal of cutaneous pathology.

[28]  Andreas Holzinger,et al.  Artificial Intelligence and Machine Learning for Digital Pathology: State-of-the-Art and Future Challenges , 2020, AI and ML for Digital Pathology.

[29]  Nicolas Pinchaud,et al.  Camelyon17 challenge , 2019 .