Sharing big biomedical data

BackgroundThe promise of Big Biomedical Data may be offset by the enormous challenges in handling, analyzing, and sharing it. In this paper, we provide a framework for developing practical and reasonable data sharing policies that incorporate the sociological, financial, technical and scientific requirements of a sustainable Big Data dependent scientific community.FindingsMany biomedical and healthcare studies may be significantly impacted by using large, heterogeneous and incongruent datasets; however there are significant technical, social, regulatory, and institutional barriers that need to be overcome to ensure the power of Big Data overcomes these detrimental factors.ConclusionsPragmatic policies that demand extensive sharing of data, promotion of data fusion, provenance, interoperability and balance security and protection of personal information are critical for the long term impact of translational Big Data analytics.

[1]  Dan Bogdanov,et al.  A new way to protect privacy in large-scale genome-wide association studies , 2013, Bioinform..

[2]  Nigam H. Shah,et al.  The coming age of data-driven medicine: translational bioinformatics' next frontier , 2012, J. Am. Medical Informatics Assoc..

[3]  Jeff Sedayao,et al.  Making Big Data, Privacy, and Anonymization Work Together in the Enterprise: Experiences and Issues , 2014, 2014 IEEE International Congress on Big Data.

[4]  Yike Guo,et al.  tranSMART: An Open Source and Community-Driven Informatics and Data Sharing Platform for Clinical and Translational Research , 2013, AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science.

[5]  Arthur W. Toga,et al.  The perfect neuroimaging-genetics-computation storm: collision of petabytes of data, millions of hardware devices and thousands of software tools , 2013, Brain Imaging and Behavior.

[6]  F. Berman,et al.  Who Will Pay for Public Access to Research Data? , 2013, Science.

[7]  N H Shah,et al.  Translational Bioinformatics Embraces Big Data , 2012, Yearbook of Medical Informatics.

[8]  E. Birney The making of ENCODE: Lessons for big-data projects , 2012, Nature.

[9]  Masato Kimura,et al.  NCBI’s Database of Genotypes and Phenotypes: dbGaP , 2013, Nucleic Acids Res..

[10]  Arthur W. Toga,et al.  The informatics core of the Alzheimer's Disease Neuroimaging Initiative , 2010, Alzheimer's & Dementia.

[11]  Piero Baglioni,et al.  Addressable high-information-density DNA nanostructures , 2007 .

[12]  Athanasios V. Vasilakos,et al.  Big data: From beginning to future , 2016, Int. J. Inf. Manag..

[13]  Rajkumar Buyya,et al.  A framework for ranking of cloud computing services , 2013, Future Gener. Comput. Syst..

[14]  Brian Craft,et al.  The Cancer Genomics Hub (CGHub): overcoming cancer through the power of torrential data , 2014, Database J. Biol. Databases Curation.

[15]  Heidi Johansen-Berg,et al.  Human connectomics — What will the future demand? , 2013, NeuroImage.

[16]  Jihoon Kim,et al.  iDASH: integrating data for analysis, anonymization, and sharing , 2012, J. Am. Medical Informatics Assoc..

[17]  Nick C Fox,et al.  The Alzheimer's disease neuroimaging initiative (ADNI): MRI methods , 2008, Journal of magnetic resonance imaging : JMRI.

[18]  Piotr Indyk,et al.  Recent Developments in the Sparse Fourier Transform: A compressed Fourier transform for big data , 2014, IEEE Signal Processing Magazine.

[19]  Rajkumar Buyya,et al.  CycloidGrid: A proximity-aware P2P-based resource discovery architecture in volunteer computing systems , 2013, Future Gener. Comput. Syst..

[20]  Mark E. Schmidt,et al.  The Alzheimer’s Disease Neuroimaging Initiative: A review of papers published since its inception , 2012, Alzheimer's & Dementia.

[21]  Michael L. Hines,et al.  Neuroinformatics Original Research Article Neuron and Python , 2022 .

[22]  Harlan M Krumholz,et al.  Publication of NIH funded trials registered in ClinicalTrials.gov: cross sectional analysis , 2012, BMJ : British Medical Journal.

[23]  Harlan M. Krumholz,et al.  Ushering in a new era of open science through data sharing: the wall must come down. , 2013, JAMA.

[24]  Tony Pan,et al.  Whitepapers on Imaging Infrastructure for Research Part Three: Security and Privacy , 2012, Journal of Digital Imaging.

[25]  Robert M Califf,et al.  Characteristics of clinical trials registered in ClinicalTrials.gov, 2007-2010. , 2012, JAMA.

[26]  C. Jack,et al.  Alzheimer's Disease Neuroimaging Initiative , 2008 .

[27]  Mark E. Schmidt,et al.  The Alzheimer's Disease Neuroimaging Initiative: A review of papers published since its inception , 2012, Alzheimer's & Dementia.

[28]  Mariano Di Claudio,et al.  Tassonomy and Review of Big Data Solutions Navigation , 2013 .

[29]  Benjamin E. Gross,et al.  The cBio cancer genomics portal: an open platform for exploring multidimensional cancer genomics data. , 2012, Cancer discovery.

[30]  Thomas H Segall-Shapiro,et al.  Creation of a Bacterial Cell Controlled by a Chemically Synthesized Genome , 2010, Science.

[31]  Patrick McConnell,et al.  The cancer translational research informatics platform , 2008, BMC Medical Informatics Decis. Mak..

[32]  Roy D. Sleator,et al.  'Big data', Hadoop and cloud computing in genomics , 2013, J. Biomed. Informatics.

[33]  Colin Tankard,et al.  Big data security , 2012, Netw. Secur..

[34]  Christian R. A. Regenbrecht,et al.  Data management strategies for multinational large-scale systems biology projects , 2012, Briefings Bioinform..

[35]  A. Toga,et al.  Connectopathy in ageing and dementia. , 2014, Brain : a journal of neurology.

[36]  Erik Blasch,et al.  Information Fusion in a Cloud-Enabled Environment , 2014 .

[37]  Rachel Schutt,et al.  Doing Data Science: Straight Talk from the Frontline , 2013 .

[38]  Yong Zhao,et al.  Cloud Computing and Grid Computing 360-Degree Compared , 2008, GCE 2008.

[39]  Marian Bubak,et al.  Support for Cooperative Experiments in e-Science: From Scientific Workflows to Knowledge Sharing , 2013 .

[40]  Doron Lancet,et al.  MOPED: Model Organism Protein Expression Database , 2011, Nucleic Acids Res..

[41]  Abhijit Dasgupta,et al.  Practical Data Science Cookbook , 2014 .

[42]  Irena Roterman-Konieczna,et al.  Identification of Ligand Binding Site and Protein-Protein Interaction Area , 2013 .

[43]  Gail Ardery,et al.  Institutional Review Board Barriers and Solutions Encountered in the Collaboration Among Pharmacists and Physicians to Improve Outcomes Now Study: A National Multicenter Practice‐Based Implementation Trial , 2013, Pharmacotherapy.

[44]  Reagan Moore,et al.  Big Data Operations: Basis for Benchmarking a Data Grid , 2013, WBDB.

[45]  Anthony D. Joseph,et al.  A Study of Cloud Computing Software-as-a-Service (SaaS) in Financial Firms , 2013 .

[46]  George Lawton,et al.  Developing Software Online With Platform-as-a-Service Technology , 2008, Computer.

[47]  Mark Gerstein,et al.  Genomics: ENCODE leads the way on big data , 2012, Nature.

[48]  Paul T. Groth,et al.  The ENCODE (ENCyclopedia Of DNA Elements) Project , 2004, Science.

[49]  Arthur W. Toga,et al.  Effi cient , distributed and interactive neuroimaging data analysis using the LONI Pipeline , 2009 .

[50]  Aniket Mahanti,et al.  Comparative performance analysis of high-speed transfer protocols for big data , 2013, 38th Annual IEEE Conference on Local Computer Networks.

[51]  Jayashree Kalpathy-Cramer,et al.  Quantitative Imaging Network: Data Sharing and Competitive AlgorithmValidation Leveraging The Cancer Imaging Archive. , 2014, Translational oncology.

[52]  Sudhir Srivastava,et al.  The early detection research network: 10-year outlook. , 2013, Clinical chemistry.

[53]  Brian Fitzgerald,et al.  Understanding open source software development , 2002 .

[54]  Erich E. Wanker,et al.  HDAC4 Reduction: A Novel Therapeutic Strategy to Target Cytoplasmic Huntingtin and Ameliorate Neurodegeneration , 2013, PLoS biology.

[55]  Arthur W. Toga,et al.  The LONI Debabeler: a mediator for neuroimaging software , 2005, NeuroImage.

[56]  Stéphane M. Meystre,et al.  Text de-identification for privacy protection: A study of its impact on clinical text information content , 2014, J. Biomed. Informatics.

[57]  Yogesh L. Simmhan,et al.  A survey of data provenance in e-science , 2005, SGMD.

[58]  Tim Kraska,et al.  Should we all be teaching "intro to data science" instead of "intro to databases"? , 2014, SIGMOD Conference.

[59]  David Lyon,et al.  Surveillance, Snowden, and Big Data: Capacities, consequences, critique , 2014, Big Data Soc..

[60]  Singh Ghuman,et al.  Cloud Computing-A Study of Infrastructure as a Service , 2015 .

[61]  Arthur W. Toga,et al.  Human neuroimaging as a “Big Data” science , 2013, Brain Imaging and Behavior.

[62]  K. Sirotkin,et al.  The NCBI dbGaP database of genotypes and phenotypes , 2007, Nature Genetics.

[63]  A. Mahati,et al.  Comparative Analysis of Transfer Protocols For Big Data , 2013 .

[64]  M. Pusic,et al.  Developing the role of big data and analytics in health professional education , 2014, Medical teacher.

[65]  A. Singleton,et al.  The Parkinson Progression Marker Initiative (PPMI) , 2011, Progress in Neurobiology.

[66]  Heather Kincaid,et al.  Development of common data elements: the experience of and recommendations from the early detection research network , 2003, Int. J. Medical Informatics.

[67]  S M Hanash,et al.  Proteomic Approaches within the NCI Early Detection Research Network for the Discovery and Identification of Cancer Biomarkers , 2001, Annals of the New York Academy of Sciences.

[68]  Dimitrios Zissis,et al.  Addressing cloud computing security issues , 2012, Future Gener. Comput. Syst..

[69]  Arthur W Toga,et al.  The clinical value of large neuroimaging data sets in Alzheimer's disease. , 2012, Neuroimaging clinics of North America.

[70]  Lucila Ohno-Machado,et al.  Big science, big data, and a big role for biomedical informatics , 2012, J. Am. Medical Informatics Assoc..