Metadata Realities for Cyberinfrastructure: Data Authors as Metadata Creators

As digital data creation technologies become more prevalent, data and metadata management are necessary to make data available, usable, sharable, and storable. Researchers in many scientific settings, however, have little experience or expertise in data and metadata management. In this dissertation, I explore the everyday data and metadata management practices of researchers through a multi-sited ethnographic study of metadata creation by researchers in the Center for Embedded Networked Sensing (CENS). In studying metadata practices, I focused on the ways that researchers document, describe, annotate, organize, and manage their data, both for their own use and the use of researchers outside of their project. This study illustrates how researchers within CENS rarely create documentation that is not directly tied to their own use of their data, and correspondingly, they rarely share data with users from outside of their immediate projects. From these observations, I develop a metadata typology that includes six components, including metadata for: data identity, data characteristics, data quality, data collection equipment, data collection methods, and data analysis methods. I use a framework of accountability to discuss the ways that metadata practices fit within social research settings. Metadata are situated in regimes of mutual accountability in which researchers learn what is important to document, what counts as sufficient documentation, and how documentation practices are to be accounted for in social research settings. Researchers work within social ontologies in which “metadata-for-data sharing” have very low visibility. As a consequence, when asked to create metadata descriptions of the data for a shared CENS metadata registry, researchers lack specific data users, and thus describe their data for members of their most likely “imagined public:” other researchers with shared research interests and methods. I argue that the cyberinfrastructure vision of wide-spread data sharing is fundamentally mis-aligned with the realities of the day-to-day metadata practices of researchers in small-scale field sciences.

[1]  S. Hilgartner,et al.  Data withholding in academic genetics: evidence from a national survey. , 2002, JAMA.

[2]  Ramesh Srinivasan Ethnomethodological architectures: Information systems driven by cultural and community visions , 2007 .

[3]  Alma Swan,et al.  The skills, role and career structure of data scientists and curators: An assessment of current practice and future needs , 2008 .

[4]  A. Strauss Social Organization of Medical Work , 1985 .

[5]  D. Turnbull Maps Narratives and Trails: Performativity, Hodology and Distributed Knowledges in Complex Adaptive Systems – an Approach to Emergent Mapping , 2007 .

[6]  Peter Suber An open access mandate for the National Institutes of Health , 2008, Open medicine : a peer-reviewed, independent, open-access journal.

[7]  Deborah J. Mayhew,et al.  The usability engineering lifecycle , 1998, CHI Conference Summary.

[8]  P. N. Edwards A Vast Machine: Computer Models, Climate Data, and the Politics of Global Warming , 2010 .

[9]  Ciaran B. Trace Information creation and the notion of membership , 2007, J. Documentation.

[10]  Mark Rouncefield,et al.  Fieldwork for Design - Theory and Practice , 2007, Computer Supported Cooperative Work.

[11]  E. Stokstad Ecology. Pioneering center ponders future as NSF pulls out. , 2011, Science.

[12]  Jeremy P. Birnholtz,et al.  Data at work: supporting sharing in science and engineering , 2003, GROUP.

[13]  Patrick Gentien,et al.  The global, complex phenomena of harmful algal blooms , 2005 .

[14]  Francine Berman,et al.  Got data?: a guide to data preservation in the information age , 2008, CACM.

[15]  Elaine Svenonius The Intellectual Foundation of Information Organization , 2000 .

[16]  Jane Hunter,et al.  Towards a Core Ontology for Information Integration , 2003, J. Digit. Inf..

[17]  D. Price Little Science, Big Science , 1965 .

[18]  Carole L. Palmer,et al.  Graduate Curriculum for Biological Information Specialists: A Key to Integration of Scale in Biology , 2007, Int. J. Digit. Curation.

[19]  Paul T. Groth,et al.  The Requirements of Using Provenance in e-Science Experiments , 2007, Journal of Grid Computing.

[20]  Geoffrey C. Bowker,et al.  Universal informatics: building cyberinfrastructure, interoperating the geosciences , 2006 .

[21]  Kalpana Shankar Order from chaos: The poetics and pragmatics of scientific recordkeeping , 2007 .

[22]  Florence Millerand,et al.  Metadata Standards. Trajectories and Enactment in the Life of an Ontology , 2009 .

[23]  Leslie M. Delserone At the Watershed: Preparing for Research Data Management and Stewardship at the University of Minnesota Libraries , 2008, Libr. Trends.

[24]  J. Law After Method: Mess in Social Science Research , 2004 .

[25]  Peter Freeman,et al.  Cyberinfrastructure for Science and Engineering: Promises and Challenges , 2005, Proceedings of the IEEE.

[26]  Jane Greenberg,et al.  Metadata Extraction and Harvesting , 2004 .

[27]  Jean-François Blanchette,et al.  A material history of bits , 2011, J. Assoc. Inf. Sci. Technol..

[28]  Wolff‐Michael Roth,et al.  Of Disciplined Minds and Disciplined Bodies: On Becoming an Ecologist , 2001 .

[29]  T. Kuhn,et al.  The Structure of Scientific Revolutions. , 1964 .

[30]  Jane Greenberg,et al.  Author-generated Dublin Core Metadata for Web Resources: A Baseline Study in an Organization , 2001, J. Digit. Inf..

[31]  C. Kesselman,et al.  A Metadata Catalog Service for Data Intensive Applications , 2003, ACM/IEEE SC 2003 Conference (SC'03).

[32]  R Lowry,et al.  Information in environmental data grids , 2008, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[33]  C. Borgman Scholarship in the Digital Age , 2007 .

[34]  P. Agre Computation and human experience , 1997 .

[35]  S. L. Star,et al.  The Ethnography of Infrastructure , 1999 .

[36]  Helena Karasti,et al.  Digital Data Practices and the Long Term Ecological Research Program Growing Global , 2008, Int. J. Digit. Curation.

[37]  Lucy A. Suchman,et al.  Located Accountabilities in Technology Production , 2002, Scand. J. Inf. Syst..

[38]  Matthew S. Mayernik,et al.  Digital libraries for scientific data discovery and reuse: from vision to practical reality , 2010, JCDL '10.

[39]  M. Whitlock Data archiving in ecology and evolution: best practices. , 2011, Trends in ecology & evolution.

[40]  Philip E. Agre Yesterday's tomorrow , 1998 .

[41]  Noel Enyedy,et al.  Little science confronts the data deluge: habitat ecology, embedded sensor networks, and digital libraries , 2007, International Journal on Digital Libraries.

[42]  Allen R. Hanson,et al.  Analytic webs support the synthesis of ecological data sets. , 2006, Ecology.

[43]  R. Emerson,et al.  Writing Ethnographic Fieldnotes , 1995 .

[44]  M. Palaniswami,et al.  Distributed Anomaly Detection in Wireless Sensor Networks , 2006, 2006 10th IEEE Singapore International Conference on Communication Systems.

[45]  Kathleen Burnett,et al.  A Comparison of the Two Traditions of Metadata Development , 1999, J. Am. Soc. Inf. Sci..

[46]  Clifford A. Lynch,et al.  The Warwick Framework: A Container Architecture for Aggregating Sets of Metadata , 1996 .

[47]  David J. DeWitt,et al.  Scientific data management in the coming decade , 2005, SGMD.

[48]  Michael Witt,et al.  Data sharing, small science and institutional repositories , 2010, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[49]  Deborah Estrin,et al.  SensorBase.org: A Centralized Repository to Slog Sensor Network Data (KNO 2) , 2006 .

[50]  Paul Dourish,et al.  What we talk about when we talk about context , 2004, Personal and Ubiquitous Computing.

[51]  Etienne Wenger,et al.  Communities of Practice: Learning, Meaning, and Identity , 1998 .

[52]  Joan H. Fujimura,et al.  Constructing `Do-able' Problems in Cancer Research: Articulating Alignment , 1987 .

[53]  Elizabeth Yakel,et al.  The Social Construction of Accountability: Radiologists and Their Record-Keeping Practices , 2001, Inf. Soc..

[54]  H. Garfinkel,et al.  I.1 The Work of a Discovering Science Construed with Materials from the Optically Discovered Pulsar , 1981 .

[55]  Keith W. Miller,et al.  How good is good enough?: an ethical analysis of software construction and use , 1994, CACM.

[56]  Jerome McDonough,et al.  METS: standardized encoding for digital library objects , 2006, International Journal on Digital Libraries.

[57]  Jane Greenberg,et al.  Iterative Design of Metadata Creation Tools for Resource Authors , 2003, Dublin Core Conference.

[58]  Dave W. Randall,et al.  Ethnography, ethnomethodology and the problem of generalisation in design , 2004, Eur. J. Inf. Syst..

[59]  William L. Anderson Some challenges and issues in managing, and preserving access to, long-lived collections of digital scientific and technical data , 2004, Data Sci. J..

[60]  K. Buetow Cyberinfrastructure: Empowering a "Third Way" in Biomedical Research , 2005, Science.

[61]  Katherine W. McCain,et al.  Visualizing a discipline: an author co-citation analysis of information science, 1972–1995 , 1998 .

[62]  Amy Friedlander The Triple Helix: Cyberinfrastructure, Scholarly Communication, and Trust , 2008 .

[63]  Austin Henderson,et al.  Interaction Analysis: Foundations and Practice , 1995 .

[64]  C. Lynch Big data: How do your data grow? , 2008, Nature.

[65]  Ramesh Srinivasan,et al.  Fluid ontologies for digital museums , 2005, International Journal on Digital Libraries.

[66]  Cory P. Knobel,et al.  Report of a Workshop on "History & !e ory of Infrastructure: Lessons for New Scientific Cyberinfrastructures" , 2007 .

[67]  G. Marcus Ethnography in/of the World System: The Emergence of Multi-Sited Ethnography , 1995 .

[68]  H. Staudigel,et al.  Scalable models of data sharing in Earth sciences , 2003 .

[69]  C. Borgman Scholarship in the Digital Age: Information, Infrastructure, and the Internet , 2007 .

[70]  Liisa H. Malkki,et al.  Improvising Theory: Process and Temporality in Ethnographic Fieldwork , 2007 .

[71]  Michael Christie,et al.  Boundaries and Accountabilities in Computer-Assisted Ethnobotany , 2006, Res. Pract. Technol. Enhanc. Learn..

[72]  Carina Lansing,et al.  Capturing and supporting contexts for scientific data sharing via the biological sciences collaboratory , 2004, CSCW.

[73]  Priscilla Caplan,et al.  Metadata fundamentals for all librarians , 2003 .

[74]  David Weissman,et al.  A Social Ontology , 2000 .

[75]  I. Farkas-Conn,et al.  From documentation to information science : the beginnings and early development of the American Documentation Institute-American Society for Information Science , 1984 .

[76]  Carla Simone,et al.  Coordination mechanisms: Towards a conceptual foundation of CSCW systems design , 1996, Computer Supported Cooperative Work (CSCW).

[77]  Robert J. Hanisch,et al.  Data standards for the international virtual observatory , 2006, Data Sci. J..

[78]  John Maddox Finding wood among the trees , 1988, Nature.

[79]  Libe Washburn,et al.  Circulation and environmental conditions during a toxigenic Pseudo-nitzschia australis bloom in the Santa Barbara Channel, California , 2006 .

[80]  Timothy R. Parsons,et al.  A manual of chemical and biological methods for seawater analysis , 1984 .

[81]  B. Latour Science in Action , 1987 .

[82]  William K. Michener,et al.  NONGEOSPATIAL METADATA FOR THE ECOLOGICAL SCIENCES , 1997 .

[83]  Jane Greenberg,et al.  Usability of a metadata creation application for resource authors , 2005 .

[84]  Nancy A. Van House,et al.  Science and technology studies and information studies , 2005, Annu. Rev. Inf. Sci. Technol..

[85]  Matthew S. Mayernik,et al.  Drowning in data: digital library architecture to support scientific use of embedded sensor networks , 2007, JCDL '07.

[86]  Paul Dourish,et al.  The value of data: considering the context of production in data economies , 2011, CSCW.

[87]  Aaron Griffiths,et al.  The Publication of Research Data: Researcher Attitudes and Behaviour , 2009, Int. J. Digit. Curation.

[88]  Christopher Kelty,et al.  Two Bits: The Cultural Significance of Free Software , 2008 .

[89]  W. Sharrock,et al.  The Organizational Accountability of Technological Work , 1998 .

[90]  S. Traweek,et al.  Beamtimes and Lifetimes: The World of High Energy Physicists , 1988 .

[91]  Pertti Alasuutari Researching Culture: Qualitative Method and Cultural Studies , 1995 .

[92]  Christine L. Borgman,et al.  Challenges in Building Digital Libraries for the 21st Century , 2002, ICADL.

[93]  Shigeo Sugimoto,et al.  Dublin Core: Process and Principles , 2002, ICADL.

[94]  M. Rouncefield,et al.  Ethnomethodologically Informed Ethnography and Information System Design. , 2000 .

[95]  Cory P. Knobel,et al.  Understanding Infrastructure: Dynamics, Tensions, and Design , 2007 .

[96]  Wendy Luttrell,et al.  "Good Enough" Methods for Ethnographic Research. , 2000 .

[97]  S. Fienberg,et al.  Sharing research data , 1985 .

[98]  Graham Pryor,et al.  Skilling Up to Do Data: Whose Role, Whose Responsibility, Whose Career? , 2009, Int. J. Digit. Curation.

[99]  M. Callon Some Elements of a Sociology of Translation: Domestication of the Scallops and the Fishermen of St Brieuc Bay , 1984 .

[100]  H. Garfinkel Studies in Ethnomethodology , 1968 .

[101]  B. Latour Pandora's Hope: Essays on the Reality of Science Studies , 1999 .

[102]  David Bearman,et al.  Authenticity of Digital Resources: Towards a Statement of Requirements in the Research Process , 1998, D Lib Mag..

[103]  Etienne Wenger,et al.  Situated Learning: Legitimate Peripheral Participation , 1991 .

[104]  Matthew S. Mayernik,et al.  How institutional factors influence the creation of scientific metadata , 2011, iConference '11.

[105]  Deborah Estrin,et al.  A Collaborative Approach to In-Place Sensor Calibration , 2003, IPSN.

[106]  Diane Gershon,et al.  Dealing with the data deluge , 2002, Nature.

[107]  B. Latour Reassembling the Social: An Introduction to Actor-Network-Theory , 2005 .

[108]  A. Clarke Situational Analysis: Grounded Theory After the Postmodern Turn , 2005 .

[109]  Matthew S. Mayernik,et al.  Knitting a fabric of sensor data and literature. in Information Processing in Sensor Networks , 2007 .

[110]  R. Kohler,,et al.  Landscapes and Labscapes: Exploring the Lab-Field Border in Biology , 2002 .

[111]  Dagobert Soergel The rise of ontologies or the reinvention of classification , 1999 .

[112]  Judith S. Olson,et al.  From Shared Databases to Communities of Practice: A Taxonomy of Collaboratories , 2007, J. Comput. Mediat. Commun..

[113]  Hollie White,et al.  Considering Personal Organization: Metadata Practices of Scientists , 2010 .

[114]  L. Suchman Human-Machine Reconfigurations: Plans and situated actions (2nd edition). , 2007 .

[115]  J. Barton,et al.  Quality assurance for digital learning object repositories: issues for the metadata creation process , 2004 .

[116]  G. King,et al.  Ensuring the Data-Rich Future of the Social Sciences , 2011, Science.