Knowledge infrastructures in science: data, diversity, and digital libraries

Digital libraries can be deployed at many points throughout the life cycles of scientific research projects from their inception through data collection, analysis, documentation, publication, curation, preservation, and stewardship. Requirements for digital libraries to manage research data vary along many dimensions, including life cycle, scale, research domain, and types and degrees of openness. This article addresses the role of digital libraries in knowledge infrastructures for science, presenting evidence from long-term studies of four research sites. Findings are based on interviews ($$n=208$$n=208), ethnographic fieldwork, document analysis, and historical archival research about scientific data practices, conducted over the course of more than a decade. The Transformation of Knowledge, Culture, and Practice in Data-Driven Science: A Knowledge Infrastructures Perspective project is based on a 2 $$\times $$× 2 design, comparing two “big science” astronomy sites with two “little science” sites that span physical sciences, life sciences, and engineering, and on dimensions of project scale and temporal stage of life cycle. The two astronomy sites invested in digital libraries for data management as part of their initial research design, whereas the smaller sites made smaller investments at later stages. Role specialization varies along the same lines, with the larger projects investing in information professionals, and smaller teams carrying out their own activities internally. Sites making the largest investments in digital libraries appear to view their datasets as their primary scientific legacy, while other sites stake their legacy elsewhere. Those investing in digital libraries are more concerned with the release and reuse of data; types and degrees of openness vary accordingly. The need for expertise in digital libraries, data science, and data stewardship is apparent throughout all four sites. Examples are presented of the challenges in designing digital libraries and knowledge infrastructures to manage and steward research data.

[1]  James Campbell,et al.  Big Opportunities in Access to "Small Science" Data , 2007, Data Sci. J..

[2]  Sarah The Lifecycle of Data Management , 2012 .

[3]  Peng Xu,et al.  Improving the change-management process , 2008, CACM.

[4]  David J. DeWitt,et al.  Scientific data management in the coming decade , 2005, SGMD.

[5]  Michael Witt,et al.  Data sharing, small science and institutional repositories , 2010, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[6]  Bakuwa Japhet,et al.  A critique of Latour and Woolgar''s argument for the social construction of scientific facts in laboratory Life: the construction of scientific facts (1986) , 2013 .

[7]  K. Brad Wray,et al.  Scientific authorship in the age of collaborative research , 2006 .

[8]  Paul T. Groth,et al.  Ten Simple Rules for the Care and Feeding of Scientific Data , 2014, PLoS Comput. Biol..

[9]  Jillian C. Wallis The Distribution of Data Management Responsibility within Scientific Research Groups , 2012 .

[10]  Yvonne M. Socha,et al.  OUT OF CITE, OUT OF MIND: THE CURRENT STATE OF PRACTICE, POLICY, AND TECHNOLOGY FOR THE CITATION OF DATA CODATA-ICSTI Task Group on Data Citation Standards and Practices , 2013 .

[11]  Christine L Borgman,et al.  Science friction: Data, metadata, and collaboration , 2011, Social studies of science.

[12]  Peter T. Darch,et al.  The ups and downs of knowledge infrastructures in science: Implications for data management , 2014, IEEE/ACM Joint Conference on Digital Libraries.

[13]  Christine L. Borgman,et al.  Data, data use, and scientific inquiry: two case studies of data practices , 2012, JCDL '12.

[14]  Helena Karasti,et al.  Digital Data Practices and the Long Term Ecological Research Program Growing Global , 2008, Int. J. Digit. Curation.

[15]  Peter Z. Kunszt,et al.  Data Mining the SDSS SkyServer Database , 2002, WDAS.

[16]  Lisa Gitelman Data Bite Man: The Work of Sustaining a Long-Term Study , 2013 .

[17]  Board on Physics Astronomy and Astrophysics in the New Millennium , 2001 .

[18]  Jennifer Nacht Big Science The Growth Of Large Scale Research , 2016 .

[19]  Noel Enyedy,et al.  Little science confronts the data deluge: habitat ecology, embedded sensor networks, and digital libraries , 2007, International Journal on Digital Libraries.

[20]  Yasmin B. Kafai,et al.  Social aspects of digital libraries , 1995 .

[21]  Brian Matthews,et al.  Data Management and Preservation Planning for Big Science , 2013, Int. J. Digit. Curation.

[22]  Noel Enyedy,et al.  Building Digital Libraries for Scientific Data: An Exploratory Study of Data Practices in Habitat Ecology , 2006, ECDL.

[23]  Zhou Zuyi IODP IN JAPAN , 2004 .

[24]  Michael Day The Digital Curation Centre , 2006 .

[25]  K. Knorr-Cetina,et al.  Epistemic cultures : how the sciences make knowledge , 1999 .

[26]  E. Berger,et al.  Ensuring the integrity, accessibility, and stewardship of research data in the digital age , 2009 .

[27]  J. Frieman,et al.  The Dark Energy Survey , 2020 .

[28]  Rachel Alyson Mandell Researchers’ Attitudes towards Data Discovery: Implications for a UCLA Data Registry , 2012 .

[29]  GoodeJoanna,et al.  Exploring Computer Science , 2011 .

[30]  P. N. Edwards,et al.  Knowledge Infrastructures: Intellectual Frameworks and Research Challenges , 2013 .

[31]  R. Perrucci,et al.  From Little Science to Big Science , 2017 .

[32]  Lisa Gitelman,et al.  Data Bite Man: The Work of Sustaining a Long-Term Study , 2013 .

[33]  F. Berman,et al.  Who Will Pay for Public Access to Research Data? , 2013, Science.

[34]  M. Biagioli,et al.  Scientific Authorship : Credit and Intellectual Property in Science , 2004 .

[35]  Matthew S. Mayernik,et al.  Who’s Got the Data? Interdependencies in Science and Technology Collaborations , 2012, Computer Supported Cooperative Work (CSCW).

[36]  Eduardo Serrano,et al.  LSST: From Science Drivers to Reference Design and Anticipated Data Products , 2008, The Astrophysical Journal.

[37]  Chuck Humphrey e-Science and the Life Cycle of Research , 2006 .

[38]  D. Denton The Royal Society of London , 1965, Nature.

[39]  Norman Gray,et al.  Managing Research Data in Big Science , 2012, ArXiv.

[40]  Harold Maurice Collins,et al.  LIGO becomes big science , 2003 .

[41]  S. Traweek,et al.  Beamtimes and Lifetimes: The World of High Energy Physicists , 1988 .

[42]  Daniel Velandia Díaz,et al.  Autoría y autoridad en ciencias Scientific Authorship Credit and intellectual property in science Editado por Mario Biagioli y Peter Galison Routledge. New York, London. 2003 , 2004 .

[43]  Sarita Albagli,et al.  Memory Practices in the Sciences , 2008 .

[44]  J H Capshew,et al.  Big Science: Price to the Present , 1992, Osiris.

[45]  Peter T. Darch,et al.  When scientists meet the public: an investigation into citizen cyberscience , 2011 .

[46]  Christine L. Borgman,et al.  Social aspects of digital libraries (working session) , 1996, DL '96.

[47]  Harland W. Epps Giant Telescopes: Astronomical Ambition and the Promise of Technology , 2005 .

[48]  Niki Vermeulen,et al.  Supersizing Science: On Building Large-Scale Research Projects in Biology , 2010 .

[49]  Anne E. Trefethen,et al.  Cyberinfrastructure for e-Science , 2005, Science.

[50]  P. N. Edwards A Vast Machine: Computer Models, Climate Data, and the Politics of Global Warming , 2010 .

[51]  Carole A. Goble,et al.  Why Linked Data is Not Enough for Scientists , 2010, 2010 IEEE Sixth International Conference on e-Science.

[52]  Yasmin B. Kafai,et al.  Social Aspects of Digital Libraries. Final Report to the National Science Foundation , 1996 .

[53]  Heath J. Mills,et al.  Microbial activity in the marine deep biosphere: progress and prospects , 2013, Front. Microbiol..

[54]  Matthew S. Mayernik,et al.  Metadata Realities for Cyberinfrastructure: Data Authors as Metadata Creators , 2011 .

[55]  Ixchel M. Faniel,et al.  Reusing Scientific Data: How Earthquake Engineering Researchers Assess the Reusability of Colleagues’ Data , 2010, Computer Supported Cooperative Work (CSCW).

[56]  L. Gitelman "Raw Data" Is an Oxymoron , 2013 .

[57]  Peter T. Darch,et al.  What lies beneath?: Knowledge infrastructures in the subseafloor biosphere and beyond , 2015, International Journal on Digital Libraries.

[58]  Christine L. Borgman,et al.  Curators to the stars , 2010, ASIST.

[59]  Steven J. Jackson,et al.  Who Killed WATERS? Mess, Method, and Forensic Explanation in the Making and Unmaking of Large-scale Science Networks , 2014 .

[60]  Sarah Higgins The DCC Curation Lifecycle Model , 2008, Int. J. Digit. Curation.

[61]  K. Abazajian,et al.  THE SEVENTH DATA RELEASE OF THE SLOAN DIGITAL SKY SURVEY , 2008, 0812.0649.

[62]  Peter Fox,et al.  Is Data Publication the Right Metaphor? , 2013, Data Sci. J..

[63]  J DeWittDavid,et al.  Scientific data management in the coming decade , 2005 .

[64]  B. Latour,et al.  Laboratory Life: The Construction of Scientific Facts , 1979 .

[65]  Joyce M. Ray,et al.  Research Data Management: Practical Strategies for Information Professionals , 2014 .

[66]  W. M. Wood-Vasey,et al.  THE NINTH DATA RELEASE OF THE SLOAN DIGITAL SKY SURVEY: FIRST SPECTROSCOPIC DATA FROM THE SDSS-III BARYON OSCILLATION SPECTROSCOPIC SURVEY , 2012, 1207.7137.

[67]  A. Strauss,et al.  The discovery of grounded theory: strategies for qualitative research aldine de gruyter , 1968 .

[68]  David Bawden,et al.  Memory Practices in the Sciences , 2007 .

[69]  T. Data,et al.  The Economic Logic of “Open Science” and the Balance between Private Property Rights and the Public Domain in Scientific Data and Information: A Primer , 2003 .

[70]  D. Kleppner Ensuring the integrity, accessibility, and stewardship of research data in the digital age , 2010 .

[71]  Nithya Ramanathan,et al.  Know Thy Sensor: Trust, Data Quality, and Data Integrity in Scientific Digital Libraries , 2007, ECDL.

[72]  Jenny Fry,et al.  Scholarship in the Digital Age: Information, Infrastructure, and the Internet , 2010, J. Assoc. Inf. Sci. Technol..

[73]  Simone Sacchi,et al.  Definitions of dataset in the scientific and technical literature , 2010, ASIST.

[74]  Matthew S. Mayernik,et al.  Drowning in data: digital library architecture to support scientific use of embedded sensor networks , 2007, JCDL '07.

[75]  Matthew S. Mayernik,et al.  Moving Archival Practices Upstream: An Exploration of the Life Cycle of Ecological Sensing Data in Collaborative Field Research , 2008, Int. J. Digit. Curation.

[76]  G. Bowker,et al.  An International Framework to Promote Access to Data , 2004, Science.

[77]  Damian Smedley,et al.  Sustaining the Data and Bioresource Commons , 2010, Science.

[78]  Christine L. Borgman,et al.  We're Working On It: Transferring the Sloan Digital Sky Survey from Laboratory to Library , 2014, Int. J. Digit. Curation.

[79]  Matthias Hemmje,et al.  State-of-the-art of long-term preservation in product lifecycle management , 2012, International Journal on Digital Libraries.

[80]  E. al.,et al.  The Sloan Digital Sky Survey: Technical summary , 2000, astro-ph/0006396.

[81]  Daniel Gooch,et al.  Communications of the ACM , 2011, XRDS.

[82]  Alexander S. Szalay,et al.  The Sloan Digital Sky Survey , 1999, Comput. Sci. Eng..

[83]  J. Greenberg Big Data, Little Data, No Data: Scholarship in the Networked World , 2016 .

[84]  Terry D. Oswalt,et al.  Planets, Stars and Stellar Systems , 2013 .

[85]  Christine L Borgman,et al.  Final Report to the National Science Foundation , 2002 .

[86]  Brian A. Maurer Models of Scientific Inquiry and Statistical Practice: Implications for the Structure of Scientific Knowledge , 2004 .

[87]  Christine L. Borgman,et al.  Big Data, Little Data, No Data: Scholarship in the Networked World , 2014 .

[88]  Jonathan Furner,et al.  Little Book, Big Book , 2003, J. Libr. Inf. Sci..

[89]  Phillip R. Sloan,et al.  Controlling our destinies : historical, philosophical, ethical, and theological perspectives on the Human Genome Project , 2000 .

[90]  Jane Greenberg Theoretical Considerations of Lifecycle Modeling: An Analysis of the Dryad Repository Demonstrating Automatic Metadata Propagation, Inheritance, and Value System Adoption , 2009 .

[91]  Richard K. Johnson Open Access , 2005 .

[92]  Katrina J. Edwards,et al.  Center for Dark Energy Biosphere Investigations (C-DEBI) , 2009 .

[93]  Jonathan Furner,et al.  Little Book, Big Book , 2003, J. Libr. Inf. Sci..

[94]  Peter T. Darch,et al.  Beyond Big or Little Science: Understanding Data Lifecycles in Astronomy and the Deep Subseafloor Biosphere , 2015 .

[95]  Alexander S. Szalay Jim Gray, astronomer , 2008, Commun. ACM.

[96]  Alexander S. Szalay,et al.  The Catalog Archive Server Database Management System , 2008, Computing in Science & Engineering.

[97]  Christine L. Borgman,et al.  When use cases are not useful: data practices, astronomy, and digital libraries , 2011, JCDL '11.

[98]  Gordon Bell,et al.  Beyond the Data Deluge , 2009, Science.

[99]  David Stuart,et al.  Knowledge Machines: Digital Transformations of the Sciences and Humanities , 2017, Online Inf. Rev..

[100]  Peter T. Darch,et al.  Ship space to database: Motivations to manage research data for the deep subseafloor biosphere , 2014, ASIST.

[101]  W. Henry Lambright,et al.  Government and Science: A Troubled, Critical Relationship and What Can Be Done about It , 2008 .

[102]  K. Borne Virtual Observatories, Data Mining, and Astroinformatics , 2013 .

[103]  Matthew S. Mayernik,et al.  Unearthing the Infrastructure: Humans and Sensors in Field-Based Scientific Research , 2013, Computer Supported Cooperative Work (CSCW).

[104]  Alberto Pepe,et al.  How Do Astronomers Share Data? Reliability and Persistence of Datasets Linked in AAS Publications and a Qualitative Study of Data Practices among US Astronomers , 2014, PloS one.

[105]  Stephen Kent Sloan Digital Sky Survey , 1994 .

[106]  Helena Karasti,et al.  Enriching the Notion of Data Curation in E-Science: Data Managing and Information Infrastructuring in the Long Term Ecological Research (LTER) Network , 2006, Computer Supported Cooperative Work (CSCW).

[107]  Farid Neema,et al.  Data sharing , 1998 .

[108]  Geoffrey C. Bowker,et al.  Towards a virtual organization for data cyberinfrastructure , 2009, JCDL '09.

[109]  C. Brodsky The Discovery of Grounded Theory: Strategies for Qualitative Research , 1968 .

[110]  Ivan Chompalov Lessons Learned from the Study of Multi-organizational Collaborations in Science and Implications for the Role of the University in the 21st Century , 2014 .

[111]  Xiao-Li Meng,et al.  The potential and perils of preprocessing: Building new foundations , 2013, 1309.6790.

[112]  Geoffrey C. Bowker,et al.  Collaborative rhythm: temporal dissonance and alignment in collaborative scientific work , 2011, CSCW.

[113]  Christine L. Borgman,et al.  What are Digital Libraries? Competing Visions , 1999, Inf. Process. Manag..

[114]  C. Borgman,et al.  If We Share Data, Will Anyone Use Them? Data Sharing and Reuse in the Long Tail of Science and Technology , 2013, PloS one.

[115]  Florence Millerand,et al.  Infrastructure Time: Long-term Matters in Collaborative Development , 2010, Computer Supported Cooperative Work (CSCW).

[116]  Matthew S. Mayernik,et al.  From artifacts to aggregations: Modeling scientific life cycles on the semantic Web , 2010, J. Assoc. Inf. Sci. Technol..

[117]  B. Cesnik,et al.  Digital Libraries , 2001, Yearbook of Medical Informatics.

[118]  Catherine Westfall,et al.  Big Science: The Growth of Large-Scale Research ed. by Peter Galison, Bruce Hevly (review) , 1992, Technology and Culture.

[119]  P. Bryan Heidorn,et al.  Shedding Light on the Dark Data in the Long Tail of Science , 2008, Libr. Trends.

[120]  Christine L. Borgman,et al.  Who is responsible for data? An exploratory study of data authorship, ownership, and responsibility , 2011, ASIST.

[121]  A. Finkbeiner,et al.  A Grand and Bold Thing: An Extraordinary New Map of the Universe Ushering In A New Era of Discovery , 2010 .