NEUROINFORMATICS Sharing privacy-sensitive access to neuroimaging and genetics data : a review and preliminary validation

The growth of data sharing initiatives for neuroimaging and genomics represents an exciting opportunity to confront the “small N” problem that plagues contemporary neuroimaging studies while further understanding the role genetic markers play in the function of the brain. When it is possible, open data sharing provides the most benefits. However, some data cannot be shared at all due to privacy concerns and/or risk of re-identification. Sharing other data sets is hampered by the proliferation of complex data use agreements (DUAs) which preclude truly automated data mining. These DUAs arise because of concerns about the privacy and confidentiality for subjects; though many do permit direct access to data, they often require a cumbersome approval process that can take months. An alternative approach is to only share data derivatives such as statistical summaries—the challenges here are to reformulate computational methods to quantify the privacy risks associated with sharing the results of those computations. For example, a derived map of gray matter is often as identifiable as a fingerprint. Thus alternative approaches to accessing data are needed. This paper reviews the relevant literature on differential privacy, a framework for measuring and tracking privacy loss in these settings, and demonstrates the feasibility of using this framework to calculate statistics on data distributed at many sites while still providing privacy.

[1]  B. Mercier,et al.  A dual algorithm for the solution of nonlinear variational problems via finite element approximation , 1976 .

[2]  E. B. Steen,et al.  The Computer-Based Patient Record: An Essential Technology for Health Care , 1992, Annals of Internal Medicine.

[3]  L Sweeney,et al.  Weaving Technology and Policy Together to Maintain Confidentiality , 1997, Journal of Law, Medicine & Ethics.

[4]  T. Norris The Computer-Based Patient Record: An Essential Technology for Health Care. Revised Edition , 1998, The Journal of the American Board of Family Medicine.

[5]  Latanya Sweeney,et al.  k-Anonymity: A Model for Protecting Privacy , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[6]  Henry C. Chueh,et al.  A security architecture for query tools used to access large biomedical databases , 2002, AMIA.

[7]  Geoffrey E. Hinton Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[8]  Irit Dinur,et al.  Revealing information while preserving privacy , 2003, PODS.

[9]  ASHWIN MACHANAVAJJHALA,et al.  L-diversity: privacy beyond k-anonymity , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[10]  Kamalika Chaudhuri,et al.  When Random Sampling Preserves Privacy , 2006, CRYPTO.

[11]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[12]  Isaac S. Kohane,et al.  Integration of Clinical and Genetic Data in the i2b2 Architecture , 2006, AMIA.

[13]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.

[14]  Sofya Raskhodnikova,et al.  Smooth sensitivity and sampling in private data analysis , 2007, STOC '07.

[15]  Ninghui Li,et al.  t-Closeness: Privacy Beyond k-Anonymity and l-Diversity , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[16]  Anders M. Dale,et al.  Feasibility of Multi-site Clinical Structural Neuroimaging Studies of Aging Using Legacy Data , 2007, Neuroinformatics.

[17]  Yufei Tao,et al.  M-invariance: towards privacy preserving re-publication of dynamic datasets , 2007, SIGMOD '07.

[18]  L. Wasserman,et al.  A Statistical Framework for Differential Privacy , 2008, 0811.2501.

[19]  Adam D. Smith,et al.  Composition attacks and auxiliary information in data privacy , 2008, KDD.

[20]  Vitaly Shmatikov,et al.  Robust De-anonymization of Large Sparse Datasets , 2008, 2008 IEEE Symposium on Security and Privacy (sp 2008).

[21]  V. Calhoun,et al.  A large scale (N =400) investigation of gray matter differences in schizophrenia using optimized voxel-based morphometry , 2008, Schizophrenia Research.

[22]  S. Nelson,et al.  Resolving Individuals Contributing Trace Amounts of DNA to Highly Complex Mixtures Using High-Density SNP Genotyping Microarrays , 2008, PLoS genetics.

[23]  Ashwin Machanavajjhala,et al.  Privacy: Theory meets Practice on the Map , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[24]  Yehuda Lindell,et al.  Secure Multiparty Computation for Privacy-Preserving Data Mining , 2009, IACR Cryptol. ePrint Arch..

[25]  Bradley Malin,et al.  k-Unlinkability: A privacy protection model for distributed data , 2008, Data Knowl. Eng..

[26]  Jennifer Couzin,et al.  Whole-Genome Data Not Anonymous, Challenging Assumptions , 2008, Science.

[27]  Nick C Fox,et al.  The Alzheimer's disease neuroimaging initiative (ADNI): MRI methods , 2008, Journal of magnetic resonance imaging : JMRI.

[28]  J. Ford,et al.  Widespread cortical dysfunction in schizophrenia: the FBIRN imaging consortium. , 2009, Schizophrenia bulletin.

[29]  Dan Suciu,et al.  Relationship privacy: output perturbation for queries with joins , 2009, PODS.

[30]  Susan C. Weber,et al.  STRIDE - An Integrated Standards-Based Translational Research Informatics Platform , 2009, AMIA.

[31]  Frank McSherry,et al.  Privacy integrated queries: an extensible platform for privacy-preserving data analysis , 2009, SIGMOD Conference.

[32]  Tim Roughgarden,et al.  Universally utility-maximizing privacy mechanisms , 2008, STOC '09.

[33]  Cynthia Dwork,et al.  Differential privacy and robust statistics , 2009, STOC '09.

[34]  Ahmad-Reza Sadeghi,et al.  Efficient Privacy-Preserving Face Recognition , 2009, ICISC.

[35]  C PierceBenjamin,et al.  Distance makes the types grow stronger , 2010 .

[36]  Ratul Mahajan,et al.  Differentially-private network trace analysis , 2010, SIGCOMM '10.

[37]  Philip S. Yu,et al.  Privacy-preserving data publishing: A survey of recent developments , 2010, CSUR.

[38]  Vitaly Shmatikov,et al.  Airavat: Security and Privacy for MapReduce , 2010, NSDI.

[39]  Benjamin C. Pierce,et al.  Distance makes the types grow stronger: a calculus for differential privacy , 2010, ICFP '10.

[40]  Vince D. Calhoun,et al.  MEG and fMRI Fusion for Non-Linear Estimation of Neural and BOLD Signal Changes , 2010, Front. Neuroinform..

[41]  M. Tobin,et al.  DataSHIELD: resolving a conflict in contemporary bioscience—performing a pooled analysis of individual-level data without sharing the data , 2010, International journal of epidemiology.

[42]  Kobbi Nissim,et al.  Impossibility of Differentially Private Universally Optimal Mechanisms , 2010, 2010 IEEE 51st Annual Symposium on Foundations of Computer Science.

[43]  Kobbi Nissim,et al.  Impossibility of Differentially Private Universally Optimal Mechanisms , 2010, FOCS.

[44]  Cynthia Dwork,et al.  Differential Privacy for Statistics: What we Know and What we Want to Learn , 2010, J. Priv. Confidentiality.

[45]  Frank McSherry,et al.  Probabilistic Inference and Differential Privacy , 2010, NIPS.

[46]  Mukund Sundararajan,et al.  Universally optimal privacy mechanisms for minimax agents , 2010, PODS '10.

[47]  S. Fullerton,et al.  Glad You Asked: Participants' Opinions of Re-Consent for DbGap Data Submission , 2010, Journal of empirical research on human research ethics : JERHRE.

[48]  Guy N. Rothblum,et al.  Boosting and Differential Privacy , 2010, 2010 IEEE 51st Annual Symposium on Foundations of Computer Science.

[49]  F. Meinecke,et al.  Analysis of Multimodal Neuroimaging Data , 2011, IEEE Reviews in Biomedical Engineering.

[50]  Andreas Haeberlen,et al.  Differential Privacy Under Fire , 2011, USENIX Security Symposium.

[51]  Ashwin Machanavajjhala,et al.  No free lunch in data privacy , 2011, SIGMOD '11.

[52]  Jonathan Katz,et al.  Efficient Privacy-Preserving Biometric Identification , 2011, NDSS.

[53]  Melissa A. Basford,et al.  Ethical and practical challenges of sharing data from genome-wide association studies: the eMERGE Consortium experience. , 2011, Genome research.

[54]  Rex E. Jung,et al.  A Baseline for the Multivariate Comparison of Resting-State Networks , 2011, Front. Syst. Neurosci..

[55]  Aaron Roth,et al.  Selling privacy at auction , 2010, EC '11.

[56]  Adam D. Smith,et al.  Privacy-preserving statistical estimation with optimal convergence rates , 2011, STOC '11.

[57]  Anand D. Sarwate,et al.  Differentially Private Empirical Risk Minimization , 2009, J. Mach. Learn. Res..

[58]  Jing Lei,et al.  Differentially Private M-Estimators , 2011, NIPS.

[59]  Ling Huang,et al.  Learning in a Large Function Space: Privacy-Preserving Mechanisms for SVM Learning , 2009, J. Priv. Confidentiality.

[60]  Differentially private Kalman filtering , 2012, 2012 50th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[61]  Anand D. Sarwate,et al.  Protecting count queries in study design , 2012, J. Am. Medical Informatics Assoc..

[62]  Aaron Roth,et al.  Beating randomized response on incoherent matrices , 2011, STOC '12.

[63]  K. Hao,et al.  Bayesian method to predict individual SNP genotypes from gene expression data , 2012, Nature Genetics.

[64]  Katrina Ligett,et al.  A Simple and Practical Algorithm for Differentially Private Data Release , 2010, NIPS.

[65]  Li Xiong,et al.  Real-time aggregate monitoring with differential privacy , 2012, CIKM.

[66]  Universally Utility-maximizing Privacy Mechanisms , 2012, SIAM J. Comput..

[67]  V. Calhoun,et al.  An ICA with reference approach in identification of genetic variation and associated brain networks , 2012, Front. Hum. Neurosci..

[68]  Daniel Kifer,et al.  Private Convex Empirical Risk Minimization and High-dimensional Regression , 2012, COLT 2012.

[69]  Elaine Shi,et al.  GUPT: privacy preserving data analysis made easy , 2012, SIGMOD Conference.

[70]  Pravesh Kothari,et al.  25th Annual Conference on Learning Theory Differentially Private Online Learning , 2022 .

[71]  Yin Yang,et al.  Functional Mechanism: Regression Analysis under Differential Privacy , 2012, Proc. VLDB Endow..

[72]  Jessica A. Turner,et al.  Electronic Data Capture, Representation, and Applications for Neuroimaging , 2012, Front. Neuroinform..

[73]  Satrajit S. Ghosh,et al.  Data sharing in neuroimaging research , 2012, Front. Neuroinform..

[74]  Kang G. Shin,et al.  Efficient Distributed Linear Classification Algorithms via the Alternating Direction Method of Multipliers , 2012, AISTATS.

[75]  Xiaoqian Jiang,et al.  Privacy Technology to Support Data Sharing for Comparative Effectiveness Research: A Systematic Review , 2013, Medical care.

[76]  Oluwasanmi Koyejo,et al.  Toward open sharing of task-based fMRI data: the OpenfMRI project , 2013, Front. Neuroinform..

[77]  Stratis Ioannidis,et al.  Privacy-Preserving Ridge Regression on Hundreds of Millions of Records , 2013, 2013 IEEE Symposium on Security and Privacy.

[78]  Vince D. Calhoun,et al.  Classification of schizophrenia patients based on resting-state functional network connectivity , 2013, Front. Neurosci..

[79]  Simon C. Potter,et al.  Genome-wide Association Analysis Identifies 14 New Risk Loci for Schizophrenia , 2013, Nature Genetics.

[80]  Lauren E. Libero,et al.  Identification of neural connectivity signatures of autism using machine learning , 2013, Front. Hum. Neurosci..

[81]  C. Regenbogen,et al.  Multisensory integration of dynamic emotional faces and voices: method for simultaneous EEG-fMRI measurements , 2013, Front. Hum. Neurosci..

[82]  Bharat B. Biswal,et al.  Making data sharing work: The FCP/INDI experience , 2013, NeuroImage.

[83]  Jessica A. Turner,et al.  Guided exploration of genomic risk for gray matter abnormalities in schizophrenia using parallel independent component analysis with reference , 2013, NeuroImage.

[84]  Adam D. Smith,et al.  (Nearly) Optimal Algorithms for Private Online Learning in Full-information and Bandit Settings , 2013, NIPS.

[85]  Anand D. Sarwate,et al.  A near-optimal algorithm for differentially-private principal components , 2012, J. Mach. Learn. Res..

[86]  Bradley P. Coe,et al.  Refinement and discovery of new hotspots of copy-number variation associated with autism spectrum disorder. , 2013, American journal of human genetics.

[87]  Martin J. Wainwright,et al.  Local Privacy and Minimax Bounds: Sharp Rates for Probability Estimation , 2013, NIPS.

[88]  Sofya Raskhodnikova,et al.  Analyzing Graphs with Node Differential Privacy , 2013, TCC.

[89]  Adam D. Smith,et al.  Differentially Private Feature Selection via Stability Arguments, and the Robustness of the Lasso , 2013, COLT.

[90]  Eran Halperin,et al.  Identifying Personal Genomes by Surname Inference , 2013, Science.

[91]  Aaron Roth,et al.  Beyond worst-case analysis in private singular vector computation , 2012, STOC '13.

[92]  Prateek Jain,et al.  Differentially Private Learning with Kernels , 2013, ICML.

[93]  Vince D. Calhoun,et al.  The spatiospectral characterization of brain networks: Fusing concurrent EEG spectra and fMRI maps , 2013, NeuroImage.

[94]  Martin J. Wainwright,et al.  Privacy Aware Learning , 2012, JACM.

[95]  Sofya Raskhodnikova,et al.  Private analysis of graph structure , 2011, Proc. VLDB Endow..

[96]  Yun Li,et al.  Differentially private feature selection , 2014, 2014 International Joint Conference on Neural Networks (IJCNN).

[97]  Prateek Jain,et al.  (Near) Dimension Independent Risk Bounds for Differentially Private Learning , 2014, ICML.

[98]  Michele T. Diaz,et al.  Schizophrenia miR-137 Locus Risk Genotype Is Associated with Dorsolateral Prefrontal Cortex Hyperactivation , 2014, Biological Psychiatry.

[99]  George J. Pappas,et al.  Differentially Private Filtering , 2012, IEEE Transactions on Automatic Control.

[100]  Pramod Viswanath,et al.  The optimal mechanism in differential privacy , 2012, 2014 IEEE International Symposium on Information Theory.