Examining Data Processing Work as Part of the Scientific Data Lifecycle: Comparing Practices Across Four Scientific Research Groups

Data processing is work that scientists must undertake in order to make data useful for analyses, and is a key component of twenty-first century scientific research. The analysis of scientific data is contingent upon the successful collection or production and then processing of data. This qualitative research study, of four data-intensive research groups, investigates scientists engaging in data processing work practices to describe and analyze three distinctive but intertwined practices: cleaning data products, selecting a subset of a data product or assembling a new data product from multiple sources, and transforming data products into a common format. These practices are necessary for researchers to transform an initial data product in to one that is ready for scientific analysis. This research finds that data processing work requires a high level of scientific and technical competence that does not merely set up analyses, but also often shapes and is shaped by iterations of research designs and research questions themselves.

[1]  P. N. Edwards A Vast Machine: Computer Models, Climate Data, and the Politics of Global Warming , 2010 .

[2]  Andrew C. Simpson,et al.  Collaboration and Trust in Healthcare Innovation: The eDiaMoND Case Study , 2005, Computer Supported Cooperative Work (CSCW).

[3]  Paul Dourish,et al.  The human infrastructure of cyberinfrastructure , 2006, CSCW '06.

[4]  Ixchel M. Faniel,et al.  Reusing Scientific Data: How Earthquake Engineering Researchers Assess the Reusability of Colleagues’ Data , 2010, Computer Supported Cooperative Work (CSCW).

[5]  L. Gitelman "Raw Data" Is an Oxymoron , 2013 .

[6]  Matthew S. Mayernik,et al.  An Exploration of the Life Cycle of eScience Collaboratory Data , 2008 .

[7]  K. Charmaz,et al.  Constructing Grounded Theory: A practical guide through qualitative analysis Kathy Charmaz Constructing Grounded Theory: A practical guide through qualitative analysis Sage 224 £19.99 0761973532 0761973532 [Formula: see text]. , 2006, Nurse researcher.

[8]  Charlotte P. Lee,et al.  Beyond trust and reliability: reusing data in collaborative cancer epidemiology research , 2013, CSCW.

[9]  P. N. Edwards,et al.  Knowledge Infrastructures: Intellectual Frameworks and Research Challenges , 2013 .

[10]  Karen Ruhleder,et al.  Steps Toward an Ecology of Infrastructure: Design and Access for Large Information Spaces , 1996, Inf. Syst. Res..

[11]  David Ribes,et al.  Sociotechnical Studies of Cyberinfrastructure and e-Research: Current Themes and Future Trajectories , 2010, Computer Supported Cooperative Work (CSCW).

[12]  Drew Paine,et al.  Producing Data, Producing Software: Developing a Radio Astronomy Research Infrastructure , 2014, 2014 IEEE 10th International Conference on e-Science.

[13]  Marina Jirotka,et al.  Supporting Scientific Collaboration: Methods, Tools and Concepts , 2013, Computer Supported Cooperative Work (CSCW).

[14]  Drew Paine,et al.  The work of developing cyberinfrastructure middleware projects , 2013, CSCW.

[15]  R. Emerson,et al.  Writing Ethnographic Fieldnotes , 1995 .

[16]  R. Weiss Learning from strangers : the art and method of qualitative interview studies , 1995 .

[17]  Matthew S. Mayernik,et al.  Who’s Got the Data? Interdependencies in Science and Technology Collaborations , 2012, Computer Supported Cooperative Work (CSCW).

[18]  Jeremy P. Birnholtz,et al.  Data at work: supporting sharing in science and engineering , 2003, GROUP.

[19]  ชวิตรา ตันติมาลา Constructing Grounded Theory: A Practical Guide through Qualitative Analysis , 2017 .