Locating ethics in data science: responsibility and accountability in global and distributed knowledge production systems

The distributed and global nature of data science creates challenges for evaluating the quality, import and potential impact of the data and knowledge claims being produced. This has significant consequences for the management and oversight of responsibilities and accountabilities in data science. In particular, it makes it difficult to determine who is responsible for what output, and how such responsibilities relate to each other; what ‘participation’ means and which accountabilities it involves, with regard to data ownership, donation and sharing as well as data analysis, re-use and authorship; and whether the trust placed on automated tools for data mining and interpretation is warranted (especially as data processing strategies and tools are often developed separately from the situations of data use where ethical concerns typically emerge). To address these challenges, this paper advocates a participative, reflexive management of data practices. Regulatory structures should encourage data scientists to examine the historical lineages and ethical implications of their work at regular intervals. They should also foster awareness of the multitude of skills and perspectives involved in data science, highlighting how each perspective is partial and in need of confrontation with others. This approach has the potential to improve not only the ethical oversight for data science initiatives, but also the quality and reliability of research outputs. This article is part of the themed issue ‘The ethical impact of data science’.

[1]  Christine L. Borgman,et al.  Big Data, Little Data, No Data: Scholarship in the Networked World , 2014 .

[2]  Sabina Leonelli,et al.  How Does One “Open” Science? Questions of Value in Biological Research , 2016, Science, technology & human values.

[3]  David Bawden,et al.  Memory Practices in the Sciences , 2007 .

[4]  Bernd Pulverer,et al.  Reproducibility blues , 2015, The EMBO journal.

[5]  Luciano Floridi,et al.  Distributed Morality in an Information Society , 2012, Science and Engineering Ethics.

[6]  S. Hilgartner Biomolecular Databases , 1995 .

[7]  B. Strasser The Experimenter's Museum: GenBank, Natural History, and the Moral Economies of Biomedicine , 2011, Isis.

[8]  Alberto Cambrosio,et al.  Cancer on Trial: Oncology as a New Style of Practice , 2011 .

[9]  Christine L Borgman,et al.  Science friction: Data, metadata, and collaboration , 2011, Social studies of science.

[10]  Rebecca Jacob,et al.  Responsibility and accountability , 2015, Journal of anaesthesiology, clinical pharmacology.

[11]  Anne Beaulieu,et al.  Virtual Knowledge. Experimenting in the Humanities and the Social Sciences. , 2013 .

[12]  Sabina Leonelli,et al.  What difference does quantity make? On the epistemology of Big Data in biology , 2014, Big Data Soc..

[13]  Sabina Leonelli,et al.  Data-Centric Biology: A Philosophical Study , 2016 .

[14]  Michael C. Frank,et al.  Estimating the reproducibility of psychological science , 2015, Science.

[15]  S. Hinchliffe,et al.  Where Species Meet , 2007 .

[16]  P. Ossorio Bodies of Data: Genomic Data and Bioscience Data Sharing , 2011, Social research.

[17]  Luciano Floridi,et al.  The Philosophy of Information Quality , 2014, Synthese Library.

[18]  Paul H Mason,et al.  The Ethics of Biomedical Big Data , 2017, Journal of Bioethical Inquiry.

[19]  K. Rajan Pharmocracy: Value, Politics, and Knowledge in Global Biomedicine , 2017 .

[20]  Monika Richter,et al.  Cognition In The Wild , 2016 .

[21]  Sarita Albagli,et al.  Memory Practices in the Sciences , 2008 .

[22]  Sabina Leonelli,et al.  Global data for local science: Assessing the scale of data infrastructures in biological and biomedical research , 2013, BioSocieties.

[23]  Carl Lagoze,et al.  Big Data, data integrity, and the fracturing of the control zone , 2014, Big Data Soc..

[24]  Michael Lynch,et al.  Truth Machine: The Contentious History of DNA Fingerprinting , 2009 .

[25]  Christine Hine,et al.  Databases as Scientific Instruments and Their Role in the Ordering of Scientific Work , 2006 .

[26]  A. Mackenzie,et al.  Bringing sequences to life: how bioinformatics corporealizes sequence data , 2003 .

[27]  Christina Freytag Being There Putting Brain Body And World Together Again , 2016 .

[28]  Kelly Edwards,et al.  From patients to partners: participant-centric initiatives in biomedical research , 2012, Nature Reviews Genetics.

[29]  D. North Competing Technologies , Increasing Returns , and Lock-In by Historical Events , 1994 .

[30]  Greg Lusk,et al.  A Vast Machine: Computer Models, Climate Data, and the Politics of Global Warming , 2014 .

[31]  Luciano Floridi,et al.  What is the Philosophy of Information , 2002 .

[32]  Pieter W. Adriaans,et al.  Philosophy of information , 2008 .

[33]  Karin Lindström,et al.  When Species Meet , 2016 .

[34]  Rob Kitchin,et al.  The data revolution : big data, open data, data infrastructures & their consequences , 2014 .

[35]  P. N. Edwards A Vast Machine: Computer Models, Climate Data, and the Politics of Global Warming , 2010 .

[36]  Karen Ruhleder,et al.  Steps Toward an Ecology of Infrastructure: Design and Access for Large Information Spaces , 1996, Inf. Syst. Res..

[37]  I. Cuthill,et al.  The ARRIVE guidelines Animal Research: Reporting In Vivo Experiments , 2010 .