Lessons learned while building the Deepwater Horizon Database: Toward improved data sharing in coastal science

Process studies and coupled-model validation efforts in geosciences often require integration of multiple data types across time and space. For example, improved prediction of hydrocarbon fate and transport is an important societal need which fundamentally relies upon synthesis of oceanography and hydrocarbon chemistry. Yet, there are no publically accessible databases which integrate these diverse data types in a georeferenced format, nor are there guidelines for developing such a database. The objective of this research was to analyze the process of building one such database to provide baseline information on data sources and data sharing and to document the challenges and solutions that arose during this major undertaking. The resulting Deepwater Horizon Database was approximately 2.4GB in size and contained over 8 million georeferenced data points collected from industry, government databases, volunteer networks, and individual researchers. The major technical challenges that were overcome were reconciliation of terms, units, and quality flags which were necessary to effectively integrate the disparate data sets. Assembling this database required the development of relationships with individual researchers and data managers which often involved extensive e-mail contacts. The average number of emails exchanged per data set was 7.8. Of the 95 relevant data sets that were discovered, 38 (40%) were obtained, either in whole or in part. Over one third (36%) of the requests for data went unanswered. The majority of responses were received after the first request (64%) and within the first week of the first request (67%). Although fewer than half of the potentially relevant datasets were incorporated into the database, the level of sharing (40%) was high compared to some other disciplines where sharing can be as low as 10%. Our suggestions for building integrated databases include budgeting significant time for e-mail exchanges, being cognizant of the cost versus benefits of pursuing reticent data providers, and building trust through clear, respectful communication and with flexible and appropriate attributions. Display Omitted The Deepwater Horizon Database integrates 8 million georeferenced data points.40% of data sets were obtained; 36% of our requests for data went unanswered.Most responses were received after the first request and within the first week.Major challenges overcome were reconciliation of terms, units, and quality flags.Significant time needs to be budgeted for data negotiation and building trust.

[1]  Mark John Costello Motivating Online Publication of Data , 2009 .

[2]  S. Lohrenz,et al.  Characterization of oil components from the Deepwater Horizon oil spill in the Gulf of Mexico using fluorescence EEM and PARAFAC techniques , 2013 .

[3]  Bernhard Seeger,et al.  The user's view on biodiversity data sharing - Investigating facts of acceptance and requirements to realize a sustainable use of research data - , 2012, Ecol. Informatics.

[4]  Sam Yeaman,et al.  Mandated data archiving greatly improves access to research data , 2013, FASEB journal : official publication of the Federation of American Societies for Experimental Biology.

[5]  James Ze Wang,et al.  Automated analysis of images in documents for intelligent document search , 2009, International Journal on Document Analysis and Recognition (IJDAR).

[6]  David L. Valentine,et al.  Propane Respiration Jump-Starts Microbial Response to a Deep Oil Spill , 2010, Science.

[7]  J. Triñanes,et al.  Variability of the Deepwater Horizon Surface Oil Spill Extent and Its Relationship to Varying Ocean Currents and Extreme Weather Conditions , 2015 .

[8]  Matthias Ehrhardt,et al.  Mathematical Modelling and Numerical Simulation of Oil Pollution Problems , 2015 .

[9]  Ira Leifer,et al.  Magnitude and oxidation potential of hydrocarbon gases released from the BP oil well blowout , 2011 .

[10]  Michael Witt,et al.  Data sharing, small science and institutional repositories , 2010, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[11]  Anne E. Thessen,et al.  Data issues in the life sciences , 2011, ZooKeys.

[12]  Mark A. Parsons,et al.  A conceptual framework for managing very diverse data for complex, interdisciplinary science , 2011, J. Inf. Sci..

[13]  Karin L. Lemkau,et al.  Composition and fate of gas and oil released to the water column during the Deepwater Horizon oil spill , 2011, Proceedings of the National Academy of Sciences.

[14]  Christine L Borgman,et al.  Science friction: Data, metadata, and collaboration , 2011, Social studies of science.

[15]  Anna E Normand,et al.  Impacts of diverted freshwater on dissolved organic matter and microbial communities in Barataria Bay, Louisiana, U.S.A. , 2011, Marine environmental research.

[16]  Pieter W. G. Bots,et al.  Eliciting conceptual models to support interdisciplinary research , 2009, J. Inf. Sci..

[17]  Andrew Whitehead,et al.  Genomic and physiological footprint of the Deepwater Horizon oil spill on resident marsh fishes , 2011, Proceedings of the National Academy of Sciences.

[18]  Adam R Ferguson,et al.  Development of a database for translational spinal cord injury research. , 2014, Journal of neurotrauma.

[19]  D. Joung,et al.  Nutrient depletion as a proxy for microbial growth in Deepwater Horizon subsurface oil/gas plumes , 2012 .

[20]  K. Yeager,et al.  Effects of oil from the 2010 Macondo well blowout on marsh foraminifera of Mississippi and Louisiana, USA. , 2013, Environmental science & technology.

[21]  B. Dzwonkowski,et al.  Subtidal circulation on the Alabama shelf during the Deepwater Horizon oil spill , 2012 .

[22]  F. Muller‐Karger,et al.  Overlap between Atlantic bluefin tuna spawning grounds and observed Deepwater Horizon surface oil in the northern Gulf of Mexico. , 2012, Marine pollution bulletin.

[23]  Nancy E. Kinner,et al.  Environmental Response Management Application , 2008 .

[24]  A. Vickers,et al.  Empirical Study of Data Sharing by Authors Publishing in PLoS Journals , 2009, PloS one.

[25]  Erik Cordes,et al.  Aragonite saturation states at cold‐water coral reefs structured by Lophelia pertusa in the northern Gulf of Mexico , 2013 .

[26]  Wei Wu,et al.  Oil Contamination in Mississippi Salt Marsh Habitats and the Impacts to Spartina alterniflora Photosynthesis , 2014 .

[27]  M. W. McCoy,et al.  Degradation and resilience in Louisiana salt marshes after the BP–Deepwater Horizon oil spill , 2012, Proceedings of the National Academy of Sciences.

[28]  C. Tenopir,et al.  Data Sharing by Scientists: Practices and Perceptions , 2011, PloS one.

[29]  Kim A Anderson,et al.  Impact of the deepwater horizon oil spill on bioavailable polycyclic aromatic hydrocarbons in Gulf of Mexico coastal waters. , 2012, Environmental science & technology.

[30]  Paul A. Montagna,et al.  Deep-Sea Benthic Footprint of the Deepwater Horizon Blowout , 2013, PloS one.

[31]  Simone Meinardi,et al.  Chemical data quantify Deepwater Horizon hydrocarbon flow rate and environmental distribution , 2012, Proceedings of the National Academy of Sciences.

[32]  John Wilbanks,et al.  'Omics Data Sharing , 2009, Science.

[33]  P. Bryan Heidorn,et al.  Shedding Light on the Dark Data in the Long Tail of Science , 2008, Libr. Trends.

[34]  M. Tarr,et al.  Chemical and Physiological Measures on Oysters (Crassostrea virginica) from Oil-Exposed Sites in Louisiana , 2011 .

[35]  M. Dailey Temporal and spatial assessment of PAHs in water, sediment, and oysters as a result of the deepwater horizon oil spill , 2012 .

[36]  Paul F. Uhlir,et al.  Emerging Models for Maintaining Scientific Data in the Public Domain , 2003 .

[37]  Ashwanth Srinivasan,et al.  On the modeling of the 2010 Gulf of Mexico Oil Spill , 2011 .

[38]  P. Dixon,et al.  Impact of the Deepwater Horizon oil spill on loggerhead turtle Caretta caretta nest densities in northwest Florida , 2017 .

[39]  Jim Giles,et al.  Databases in peril , 2005, Nature.

[40]  D. Joung,et al.  Trace element distributions in the water column near the Deepwater Horizon well blowout. , 2013, Environmental science & technology.