Understanding the 'Intensive' in 'Data Intensive Research': Data Flows in Next Generation Sequencing and Environmental Networked Sensors

Genomic and environmental sciences represent two poles of scientific data. In the first, highly parallel sequencing facilities generate large quantities of sequence data. In the latter, loosely networked remote and field sensors produce intermittent streams of different data types. Yet both genomic and environmental sciences are said to be moving to data intensive research. This paper explores and contrasts data flow in these two domains in order to better understand how data intensive research is being done. Our case studies are next generation sequencing for genomics and environmental networked sensors. Our objective was to enrich understanding of the ‘intensive’ processes and properties of data intensive research through a ‘sociology’ of data using methods that capture the relational properties of data flows. Our key methodological innovation was the staging of events for practitioners with different kinds of expertise in data intensive research to participate in the collective annotation of visual forms. Through such events we built a substantial digital data archive of our own that we then analysed in terms of three traits of data flow: durability, replicability and metrology. Our findings are that analysing data flow with respect to these three traits provides better insight into how doing data intensive research involves people, infrastructures, practices, things, knowledge and institutions. Collectively, these elements shape the topography of data and condition how it flows. We argue that although much attention is given to phenomena such as the scale, volume and speed of data in data intensive research, these are measures of what we call ‘extensive’ properties rather than intensive ones. Our thesis is that extensive changes, that is to say those that result in non-linear changes in metrics, can be seen to result from intensive changes that bring multiple, disparate flows into confluence. If extensive shifts in the modalities of data flow do indeed come from the alignment of disparate things, as we suggest, then we advocate the staging of workshops and other events with the purpose of developing the ‘missing’ metrics of data flow.

[1]  Mark J. Pallen,et al.  Next-Generation Sequencing—the Promise and Perils of Charting the Great Microbial Unknown , 2008, Microbial Ecology.

[2]  Kirk Martinez,et al.  Environmental Sensor Networks: A revolution in the earth system science? , 2006 .

[3]  Donny D. Licatalosi,et al.  RNA processing and its regulation: global insights into biological networks , 2010, Nature Reviews Genetics.

[4]  M. Metzker Sequencing technologies — the next generation , 2010, Nature Reviews Genetics.

[5]  J. L. Heilbron,et al.  Leviathan and the air-pump. Hobbes, Boyle, and the experimental life , 1989, Medical History.

[6]  A. Mol The Body Multiple: Ontology in Medical Practice , 2003 .

[7]  Donna Harawy Situated Knowledges: The Science Question in Feminism and the Privilege of Partial Perspective , 2022, Philosophical Literary Journal Logos.

[8]  Tony Hey,et al.  The Fourth Paradigm: Data-Intensive Scientific Discovery , 2009 .

[9]  Scott Burleigh,et al.  New opportunities in ecological sensing using wireless sensor networks , 2006 .

[10]  G. Hon,et al.  Next-generation genomics: an integrative approach , 2010, Nature Reviews Genetics.

[11]  Charles Anderson,et al.  The end of theory: The data deluge makes the scientific method obsolete , 2008 .

[12]  Deborah Estrin,et al.  New Approaches in Embedded Networked Sensing for Terrestrial Ecological Observatories , 2007 .

[13]  J. Urry Sociology beyond societies : mobilities for the twenty-first century , 2000 .

[14]  Matthew S. Mayernik,et al.  Drowning in data: digital library architecture to support scientific use of embedded sensor networks , 2007, JCDL '07.

[15]  A. Arundel,et al.  The Bioeconomy to 2030 : designing a policy agenda , 2009 .

[16]  Ruth McNally,et al.  Data-Intensive Research Workshop (15-19 March 2010) Report , 2010 .

[17]  Michael Oborne,et al.  The bioeconomy to 2030: designing a policy agenda , 2010 .

[18]  J. Urry,et al.  The New Mobilities Paradigm , 2006 .

[19]  J. Law After Method: Mess in Social Science Research , 2004 .

[20]  Elaine R. Mardis,et al.  A decade’s perspective on DNA sequencing technology , 2011, Nature.

[21]  Peter Arzberger,et al.  New Eyes on the World: Advanced Sensors for Ecology , 2009 .

[22]  Martin Kersten,et al.  Data-Intensive Research Theme , 2010 .