Ethics and Epistemology in Big Data Research

Biomedical innovation and translation are increasingly emphasizing research using “big data.” The hope is that big data methods will both speed up research and make its results more applicable to “real-world” patients and health services. While big data research has been embraced by scientists, politicians, industry, and the public, numerous ethical, organizational, and technical/methodological concerns have also been raised. With respect to technical and methodological concerns, there is a view that these will be resolved through sophisticated information technologies, predictive algorithms, and data analysis techniques. While such advances will likely go some way towards resolving technical and methodological issues, we believe that the epistemological issues raised by big data research have important ethical implications and raise questions about the very possibility of big data research achieving its goals.

[1]  Andrei Z. Broder,et al.  Big Data: New Paradigm or "Sound and Fury, Signifying Nothing"? , 2015, WSDM.

[2]  Peter A. Chow-White,et al.  From the bench to the bedside in the big data age: ethics and practices of consent and privacy for clinical genomics and personalized medicine , 2015, Ethics and Information Technology.

[3]  John P. A. Ioannidis,et al.  Opportunities and challenges in developing risk prediction models with electronic health records data: a systematic review , 2017, J. Am. Medical Informatics Assoc..

[4]  J. Ioannidis Why Most Published Research Findings Are False , 2005, PLoS medicine.

[5]  Andy Podgurski,et al.  Big Bad Data: Law, Public Health, and Biomedical Databases , 2013, The Journal of law, medicine & ethics : a journal of the American Society of Law, Medicine & Ethics.

[6]  Josef Korinek,et al.  Proceedings of the American Society of Clinical Oncology , 1982 .

[7]  D. Madigan,et al.  Evaluating the impact of database heterogeneity on observational study results. , 2013, American journal of epidemiology.

[8]  E. Dove,et al.  What Role for Law, Human Rights, and Bioethics in an Age of Big Data, Consortia Science, and Consortia Ethics? The Importance of Trustworthiness , 2015, Laws.

[9]  Nigam H. Shah,et al.  The coming age of data-driven medicine: translational bioinformatics' next frontier , 2012, J. Am. Medical Informatics Assoc..

[10]  S. Hoffman The Promise and Perils of Open Medical Data. , 2016, The Hastings Center report.

[11]  Lawrence Busch,et al.  Big Data, Big Questions| A Dozen Ways to Get Lost in Translation: Inherent Challenges in Large Scale Data Sets , 2014 .

[12]  Geoffrey S. Ginsburg,et al.  Realizing the Full Potential of Precision Medicine in Health and Health Care. , 2016, JAMA.

[13]  Fabian Prasser,et al.  Efficient and effective pruning strategies for health data de-identification , 2016, BMC Medical Informatics and Decision Making.

[14]  Magnus Rattray,et al.  Making sense of big data in health research: Towards an EU action plan , 2016, Genome Medicine.

[15]  E. Larson,et al.  Building trust in the power of "big data" research to serve the public good. , 2013, JAMA.

[16]  Peter A. Chow-White,et al.  Genomic Big Data and Privacy: Challenges and Opportunities for Precision Medicine , 2016, Computer Supported Cooperative Work (CSCW).

[17]  David R. Maffitt,et al.  De-identification of Medical Images with Retention of Scientific Research Value. , 2015, Radiographics : a review publication of the Radiological Society of North America, Inc.

[18]  Tatiana Dilla,et al.  No big data without small data: learning health care systems begin and end with the individual patient , 2015, Journal of evaluation in clinical practice.

[19]  Fabian Prasser,et al.  The Importance of Context: Risk-based De-identification of Biomedical Data , 2016, Methods of Information in Medicine.

[20]  Yael Kovo Office of Science and Technology Policy (OSTP) , 2016 .

[21]  J. Pfeifer,et al.  Real‐world data in the molecular era—finding the reality in the real world , 2016, Clinical pharmacology and therapeutics.

[22]  Transnational research partnerships: leveraging big data to enhance US health , 2015, Journal of Epidemiology & Community Health.

[23]  Daniel J. Selcer Ethics (review) , 2008 .

[24]  Andrew Bate,et al.  Designing and incorporating a real world data approach to international drug development and use: what the UK offers. , 2016, Drug discovery today.

[25]  Stuart M. Speedie,et al.  The value of structured data elements from electronic health records for identifying subjects for primary care clinical trials , 2015, BMC Medical Informatics and Decision Making.

[26]  M. Mostert,et al.  Big Data in medical research and EU data protection law: challenges to the consent or anonymise approach , 2016, European Journal of Human Genetics.

[27]  James Hendler,et al.  Data Integration for Heterogenous Datasets , 2014, Big Data.

[28]  Bonnie Kaplan,et al.  How Should Health Data Be Used? , 2016, Cambridge Quarterly of Healthcare Ethics.

[29]  George Bosilca,et al.  The next frontier , 1997 .

[30]  Ruth Gilbert,et al.  The market in healthcare data , 2015, BMJ : British Medical Journal.

[31]  Jeanne Erdmann,et al.  As personal genomes join big data will privacy and access shrink? , 2013, Chemistry & biology.

[32]  John P. A. Ioannidis,et al.  Big data meets public health , 2014, Science.

[33]  Ian Scott,et al.  Data Linkage: A powerful research tool with potential problems , 2010, BMC health services research.

[34]  Abraham L. Newman What the “right to be forgotten” means for privacy in a digital age , 2015, Science.

[35]  Marcel Salathé,et al.  Ethical Challenges of Big Data in Public Health , 2015, PLoS Comput. Biol..

[36]  Bartha M Knoppers,et al.  An ethics safe harbor for international genomics research? , 2013, Genome Medicine.

[37]  Jamie Cattell,et al.  How big data can revolutionize pharmaceutical R & D April 2013 , 2013 .

[38]  J. Ioannidis,et al.  Clinical Genomics: From Pathogenicity Claims to Quantitative Risk Estimates. , 2016, JAMA.

[39]  Anthony N. Nguyen,et al.  De-identification of health records using Anonym: Effectiveness and robustness across datasets , 2014, Artif. Intell. Medicine.

[40]  José A. Sacristán Md PhD Medical Director and No big data without small data: learning health care systems begin and end with the individual patient , 2015 .

[41]  J. Ioannidis,et al.  Agreement of treatment effects for mortality from routinely collected data and subsequent randomized trials: meta-epidemiological survey , 2016, British Medical Journal.

[42]  Abraham Verghese,et al.  Evolutionary Pressures on the Electronic Health Record: Caring for Complexity. , 2016, JAMA.

[43]  Mikko Niemi,et al.  Genetics is a major determinant of expression of the human hepatic uptake transporter OATP1B1, but not of OATP1B3 and OATP2B1 , 2013, Genome Medicine.

[44]  Türkay Dereli,et al.  Big Data and Ethics Review for Health Systems Research in LMICs: Understanding Risk, Uncertainty and Ignorance—And Catching the Black Swans? , 2014, The American journal of bioethics : AJOB.

[45]  David Meyre,et al.  From big data analysis to personalized medicine for all: challenges and opportunities , 2015, BMC Medical Genomics.

[46]  J. Avorn,et al.  A review of uses of health care utilization databases for epidemiologic research on therapeutics. , 2005, Journal of clinical epidemiology.

[47]  Darcy A. Davis,et al.  Bringing Big Data to Personalized Healthcare: A Patient-Centered Framework , 2013, Journal of General Internal Medicine.

[48]  K. Fiedler Voodoo Correlations Are Everywhere—Not Only in Neuroscience , 2011, Perspectives on psychological science : a journal of the Association for Psychological Science.

[49]  Melanie Swan,et al.  The Quantified Self: Fundamental Disruption in Big Data Science and Biological Discovery , 2013, Big Data.

[50]  Kate M. Miltner,et al.  Big Data| Critiquing Big Data: Politics, Ethics, Epistemology | Special Section Introduction , 2014 .

[51]  F. Collins,et al.  A new initiative on precision medicine. , 2015, The New England journal of medicine.

[52]  John P. A. Ioannidis,et al.  Routinely collected data and comparative effectiveness evidence: promises and limitations , 2016, Canadian Medical Association Journal.

[53]  J. Vockley,et al.  Precision medicine in the age of big data: The present and future role of large‐scale unbiased sequencing in drug discovery and development , 2016, Clinical pharmacology and therapeutics.

[54]  D. Strech,et al.  Collective agency and the concept of ‘public’ in public involvement: A practice-oriented analysis , 2016, BMC Medical Ethics.

[55]  J. Ioannidis,et al.  What Happens When Underperforming Big Ideas in Research Become Entrenched? , 2016, JAMA.

[56]  Katherine Bourzac Collaborations: Mining the motherlodes , 2015, Nature.

[57]  D. Lazer,et al.  The Parable of Google Flu: Traps in Big Data Analysis , 2014, Science.

[58]  Aryya Gangopadhyay,et al.  A data recipient centered de-identification method to retain statistical attributes , 2014, J. Biomed. Informatics.

[59]  John P. A. Ioannidis,et al.  Systematic assessment of pharmaceutical prescriptions in association with cancer risk: a method to conduct a population-wide medication-wide longitudinal study , 2016, Scientific Reports.

[60]  Viju Raghupathi,et al.  Big data analytics in healthcare: promise and potential , 2014, Health Information Science and Systems.

[61]  Stephen Fenlon,et al.  The anaesthetist and the Medicines and Healthcare products Regulatory Agency , 2012 .

[62]  S. Hoffman Electronic Health Records and Research: Privacy Versus Scientific Priorities , 2010, The American journal of bioethics : AJOB.

[63]  D. Madigan,et al.  Empirical assessment of methods for risk identification in healthcare data: results from the experiments of the Observational Medical Outcomes Partnership , 2012, Statistics in medicine.

[64]  Eneida A. Mendonça,et al.  Genetic data and electronic health records: a discussion of ethical, logistical and technological considerations , 2013, J. Am. Medical Informatics Assoc..

[65]  D. Boyd,et al.  CRITICAL QUESTIONS FOR BIG DATA , 2012 .

[66]  Heikki Joensuu,et al.  Comparison of solution-based exome capture methods for next generation sequencing , 2011, Genome Biology.

[67]  J Ferguson,et al.  The next frontier: Fostering innovation by improving health data access and utilization , 2015, Clinical pharmacology and therapeutics.

[68]  J. Ioannidis,et al.  Collaborative Cancer Epidemiology in the 21st Century: The Model of Cancer Consortia , 2013, Cancer Epidemiology, Biomarkers & Prevention.

[69]  Luciano Floridi,et al.  The Ethics of Big Data: Current and Foreseeable Issues in Biomedical Contexts , 2015, Science and Engineering Ethics.

[70]  Raymond K. Auerbach,et al.  The real cost of sequencing: higher than you think! , 2011, Genome Biology.

[71]  Jocelyn Kaiser,et al.  BIOMEDICAL RESOURCES. Funding for key data resources in jeopardy. , 2016, Science.

[72]  Cornelius Puschmann,et al.  Big Data, Big Questions| Metaphors of Big Data , 2014 .

[73]  John P A Ioannidis,et al.  Informed Consent, Big Data, and the Oxymoron of Research That Is Not Research , 2013, The American journal of bioethics : AJOB.

[74]  A. Docherty,et al.  Big Data – ethical perspectives , 2014, Anaesthesia.

[75]  J. Ioannidis,et al.  Design and Analysis for Studying microRNAs in Human Disease: A Primer on -Omic Technologies. , 2014, American journal of epidemiology.

[76]  I. Kerridge,et al.  More Than One Way to Be Global: Globalization of Research and the Contest of Ideas , 2016, The American journal of bioethics : AJOB.

[77]  Timothy Caulfield,et al.  Biotechnology and the popular press: hype and the selling of science. , 2004, Trends in biotechnology.

[78]  John P A Ioannidis,et al.  Design and analysis of metabolomics studies in epidemiologic research: a primer on -omic technologies. , 2014, American journal of epidemiology.

[79]  Liam Peyton,et al.  A unified framework for evaluating the risk of re-identification of text de-identification tools , 2016, J. Biomed. Informatics.

[80]  John P A Ioannidis,et al.  Assessment of vibration of effects due to model specification can demonstrate the instability of observational associations. , 2015, Journal of clinical epidemiology.

[81]  B. Knoppers,et al.  Ethics review for international data-intensive research , 2016, Science.

[82]  Eric Bender,et al.  Big data in biomedicine: 4 big questions , 2015, Nature.

[83]  Markus Christen,et al.  Ethical Challenges of Simulation-Driven Big Neuroscience , 2016 .

[84]  J. Ioannidis Microarrays and molecular research: noise discovery? , 2005, The Lancet.

[85]  Derek C Angus,et al.  Fusing Randomized Trials With Big Data: The Key to Self-learning Health Care Systems? , 2015, JAMA.

[86]  D. Roden,et al.  Integrating electronic health record genotype and phenotype datasets to transform patient care , 2016, Clinical pharmacology and therapeutics.

[87]  A. Butte,et al.  Leveraging big data to transform target selection and drug discovery , 2016, Clinical pharmacology and therapeutics.

[88]  J. Ioannidis,et al.  Current use of routinely collected health data to complement randomized controlled trials: a meta-epidemiological survey. , 2016, CMAJ open.

[89]  Tomasz Janowski,et al.  Interoperability in Big, Open, and Linked Data--Organizational Maturity, Capabilities, and Data Portfolios , 2014, Computer.

[90]  T. Murdoch,et al.  The inevitable application of big data to health care. , 2013, JAMA.

[91]  Gang-hoon Kim,et al.  Potentiality of Big Data in the Medical Sector: Focus on How to Reshape the Healthcare System , 2013, Healthcare informatics research.

[92]  Charles Auffray,et al.  Participatory medicine: a driving force for revolutionizing healthcare , 2013, Genome Medicine.

[93]  Pam Carter,et al.  The social licence for research: why care.data ran into trouble , 2015, Journal of Medical Ethics.

[94]  J. Ioannidis,et al.  Epidemiologic design and analysis for proteomic studies: a primer on -omic technologies. , 2015, American journal of epidemiology.

[95]  P. Brown,et al.  Balancing Benefits and Risks of Immortal Data: Participants' Views of Open Consent in the Personal Genome Project. , 2016, The Hastings Center report.

[96]  E. Schadt The changing privacy landscape in the era of big data , 2012, Molecular systems biology.

[97]  Nicolas P Terry Big data proxies and health privacy exceptionalism. , 2014, Health matrix.

[98]  P. Rothwell Subgroup analysis in randomised controlled trials: importance, indications, and interpretation , 2005, The Lancet.

[99]  K. Brothers,et al.  Clinical decision-making and secondary findings in systems medicine , 2016, BMC Medical Ethics.

[100]  Peter J. Tonellato,et al.  Scalable and cost-effective NGS genotyping in the cloud , 2015, BMC Medical Genomics.

[101]  David W Bates,et al.  Integrating Predictive Analytics Into High-Value Care: The Dawn of Precision Delivery. , 2016, JAMA.

[102]  Lee Murray,et al.  The 100,000 Genomes Project , 2015 .

[103]  Mark A. Rothstein,et al.  Ethical Issues in Big Data Health Research: Currents in Contemporary Bioethics , 2015, Journal of Law, Medicine & Ethics.

[104]  Rick Howard Big data hype cut down to size , 2013 .

[105]  Melnned M. Kantardzic Big Data Analytics , 2013, Lecture Notes in Computer Science.

[106]  Jill U. Adams,et al.  Genetics: Big hopes for big data , 2015, Nature.

[107]  Nicolas P. Terry,et al.  Protecting Patient Privacy in the Age of Big Data , 2012 .

[108]  Werner Callebaut,et al.  Scientific perspectivism: A philosopher of science's response to the challenge of big data biology. , 2012, Studies in history and philosophy of biological and biomedical sciences.

[109]  A. Terzic,et al.  Big Data Transforms Discovery–Utilization Therapeutics Continuum , 2016, Clinical pharmacology and therapeutics.

[110]  Henri-Corto Stoeklé,et al.  23andMe: a new two-sided data-banking market model , 2016, BMC Medical Ethics.

[111]  Leo Anthony Celi,et al.  A “datathon” model to support cross-disciplinary collaboration , 2016, Science Translational Medicine.

[112]  Thomas Ploug,et al.  Meta consent: a flexible and autonomous way of obtaining informed consent for secondary research , 2015, BMJ : British Medical Journal.

[113]  Jon R Lorsch,et al.  Perspective: Sustaining the big-data ecosystem , 2015, Nature.

[114]  Kenneth Goossens,et al.  Monitoring laboratory data across manufacturers and laboratories--A prerequisite to make "Big Data" work. , 2015, Clinica chimica acta; international journal of clinical chemistry.

[115]  I. Kohane,et al.  Finding the missing link for big biomedical data. , 2014, JAMA.

[116]  Joachim Roski,et al.  Creating value in health care through big data: opportunities and policy implications. , 2014, Health affairs.

[117]  Alexander Heintzel Realizing the Full Potential , 2018 .

[118]  G. Wayne Clough,et al.  Office of Science and Technology Policy Hearing Welcome Remarks , 1999 .

[119]  Fabrício F. Costa Big data in biomedicine. , 2014, Drug discovery today.