What difference does quantity make? On the epistemology of Big Data in biology

Is Big Data science a whole new way of doing research? And what difference does data quantity make to knowledge production strategies and their outputs? I argue that the novelty of Big Data science does not lie in the sheer quantity of data involved, but rather in (1) the prominence and status acquired by data as commodity and recognised output, both within and outside of the scientific community and (2) the methods, infrastructures, technologies, skills and knowledge developed to handle data. These developments generate the impression that data-intensive research is a new mode of doing science, with its own epistemology and norms. To assess this claim, one needs to consider the ways in which data are actually disseminated and used to generate knowledge. Accordingly, this article reviews the development of sophisticated ways to disseminate, integrate and re-use data acquired on model organisms over the last three decades of work in experimental biology. I focus on online databases as prominent infrastructures set up to organise and interpret such data and examine the wealth and diversity of expertise, resources and conceptual scaffolding that such databases draw upon. This illuminates some of the conditions under which Big Data needs to be curated to support processes of discovery across biological subfields, which in turn highlights the difficulties caused by the lack of adequate curation for the vast majority of data in the life sciences. In closing, I reflect on the difference that data quantity is making to contemporary biology, the methodological and epistemic challenges of identifying and analysing data given these developments, and the opportunities and worries associated with Big Data discourse and methods.

[1]  Eric Gossett,et al.  Big Data: A Revolution That Will Transform How We Live, Work, and Think , 2015 .

[2]  Sabina Leonelli,et al.  Valuing data in postgenomic biology : how data donation and curation practices challenge the scientific publication system , 2015 .

[3]  James Mussell Raw Data is an Oxymoron , 2014 .

[4]  Sabina Leonelli,et al.  Bigger, faster, better? Rhetorics and practices of large-scale research in contemporary bioscience , 2013 .

[5]  Stephen Hilgartner,et al.  Constituting large-scale biology: Building a regime of governance in the early years of the Human Genome Project , 2013 .

[6]  Sabina Leonelli,et al.  Making open data work for plant scientists , 2013, Journal of experimental botany.

[7]  Ruth McNally,et al.  Living Multiples: How Large-scale Scientific Data-mining Pursues Identity and Differences , 2013 .

[8]  Viktor Mayer-Schnberger,et al.  Big Data: A Revolution That Will Transform How We Live, Work, and Think , 2013 .

[9]  Sabina Leonelli,et al.  Why the Current Insistence on Open Access to Scientific Data? Big Data, Knowledge Production, and the Political Economy of Contemporary Biology , 2013 .

[10]  Rita Raley,et al.  Dataveillance and Countervailance , 2013 .

[11]  Dataveillance and Countervailance , 2013 .

[12]  Christopher Kelty,et al.  This is not an article: Model organism newsletters and the question of ‘open science’ , 2012 .

[13]  K. Johnson Ordering Life: Karl Jordan and the Naturalist Tradition , 2012 .

[14]  Sabina Leonelli,et al.  Classificatory Theory in Data-intensive Science: The Case of Open Biomedical Ontologies , 2012 .

[15]  Orkun S. Soyer,et al.  The roles of integration in molecular systems biology. , 2012, Studies in history and philosophy of biological and biomedical sciences.

[16]  R. Ankeny,et al.  Re-thinking organisms: The impact of databases on model organism biology. , 2012, Studies in history and philosophy of biological and biomedical sciences.

[17]  Staffan Müller-Wille,et al.  Natural history and information overload: The case of Linnaeus , 2012, Studies in history and philosophy of biological and biomedical sciences.

[18]  Sabina Leonelli,et al.  When humans are the exception: Cross-species databases at the interface of biological and clinical research , 2012, Social studies of science.

[19]  Christine L Borgman,et al.  Science friction: Data, metadata, and collaboration , 2011, Social studies of science.

[20]  Hans-Jörg Rheinberger,et al.  Infra-Experimentality: From Traces to Data, from Data to Patterning Facts , 2011 .

[21]  Mary S. Morgan,et al.  How Well Do Facts Travel?: The Dissemination of Reliable Knowledge , 2010 .

[22]  Sabina Leonelli,et al.  Sustainable digital infrastructure , 2010, EMBO reports.

[23]  P. N. Edwards A Vast Machine: Computer Models, Climate Data, and the Politics of Global Warming , 2010 .

[24]  Jenny Fry,et al.  Scholarship in the Digital Age: Information, Infrastructure, and the Internet , 2010, J. Assoc. Inf. Sci. Technol..

[25]  Niki Vermeulen,et al.  Collaboration in the new life sciences , 2010 .

[26]  Sabina Leonelli,et al.  How Well Do Facts Travel?: Packaging Small Facts for Re-Use: Databases in Model Organism Biology , 2010 .

[27]  Tony Hey,et al.  The Fourth Paradigm: Data-Intensive Scientific Discovery , 2009 .

[28]  A. McMeekin,et al.  Public or Private Economies of Knowledge?: Turbulence in the Biological Sciences , 2009 .

[29]  Susanne Bauer,et al.  Mining data, gathering variables and recombining information: the flexible architecture of epidemiological studies. , 2008, Studies in history and philosophy of biological and biomedical sciences.

[30]  Bruno J. Strasser,et al.  GenBank--Natural History in the 21st Century? , 2008, Science.

[31]  Lincoln D. Stein,et al.  Towards a cyberinfrastructure for the biological sciences: progress, visions and challenges , 2008, Nature Reviews Genetics.

[32]  David Bawden,et al.  Memory Practices in the Sciences , 2007 .

[33]  Christine Hine,et al.  Databases as Scientific Instruments and Their Role in the Ordering of Scientific Work , 2006 .

[34]  William Bechtel,et al.  Discovering Cell Mechanisms: The Creation of Modern Cell Biology , 2005 .

[35]  Sarita Albagli,et al.  Memory Practices in the Sciences , 2008 .

[36]  Alison Wylie,et al.  Thinking from Things: Essays in the Philosophy of Archaeology , 2002 .

[37]  M. Ashburner,et al.  Taking stock of our models: the function and future of stock centres , 2002, Nature Reviews Genetics.

[38]  Paul A. Martin Genetic governance: The risks, oversight and regulation of genetic databases in the UK , 2001 .

[39]  Geoffrey C. Bowker Biodiversity Datadiversity , 2000 .

[40]  T. Porter,et al.  Trust in numbers: The pursuit of objectivity in science and public life , 1996 .

[41]  N. Denzin Sociological Methods: A Sourcebook , 1978 .