The future low-temperature geochemical data-scape as envisioned by the U.S. geochemical community

Abstract Data sharing benefits the researcher, the scientific community, and the public by allowing the impact of data to be generalized beyond one project and by making science more transparent. However, many scientific communities have not developed protocols or standards for publishing, citing, and versioning datasets. One community that lags in data management is that of low-temperature geochemistry (LTG). This paper resulted from an initiative from 2018 through 2020 to convene LTG and data scientists in the U.S. to strategize future management of LTG data. Through webinars, a workshop, a preprint, a townhall, and a community survey, the group of U.S. scientists discussed the landscape of data management for LTG – the data-scape. Currently this data-scape includes a “street bazaar” of data repositories. This was deemed appropriate in the same way that LTG scientists publish articles in many journals. The variety of data repositories and journals reflect that LTG scientists target many different scientific questions, produce data with extremely different structures and volumes, and utilize copious and complex metadata. Nonetheless, the group agreed that publication of LTG science must be accompanied by sharing of data in publicly accessible repositories, and, for sample-based data, registration of samples with globally unique persistent identifiers. LTG scientists should use certified data repositories that are either highly structured databases designed for specialized types of data, or unstructured generalized data systems. Recognizing the need for tools to enable search and cross-referencing across the proliferating data repositories, the group proposed that the overall data informatics paradigm in LTG should shift from “build data repository, data will come” to “publish data online, cybertools will find”. Funding agencies could also provide portals for LTG scientists to register funded projects and datasets, and forge approaches that cross national boundaries. The needed transformation of the LTG data culture requires emphasis in student education on science and management of data.

[1]  S. W. Christensen,et al.  Importance of Data Management in a Long-Term Biological Monitoring Program , 2011, Environmental management.

[2]  Adhemar Zerlotini,et al.  Single nucleotide variants and InDels identified from whole-genome re-sequencing of Guzerat, Gyr, Girolando and Holstein cattle breeds , 2017, PloS one.

[3]  William K. Michener,et al.  Meta-information concepts for ecological data management , 2006, Ecol. Informatics.

[4]  R. P. Breckenridge,et al.  Determination of Background Concentrations of Inorganics in Soils and Sediments at Hazardous Waste Sites , 1998 .

[5]  J. Podgorski,et al.  Global threat of arsenic in groundwater , 2020, Science.

[6]  Shreyas Cholia,et al.  Launching an Accessible Archive of Environmental Data , 2019, Eos.

[7]  Denise Hanway Riedl,et al.  Quality assurance mechanisms for the unregulated research environment. , 2013, Trends in biotechnology.

[8]  Chris North,et al.  Intelligent systems for geosciences , 2018, Communications of the ACM.

[9]  Elpídio Inácio Fernandes Filho,et al.  Modelling and mapping soil organic carbon stocks in Brazil , 2019, Geoderma.

[10]  William E. Dietrich,et al.  Constitutive mass balance relations between chemical composition, volume, density, porosity, and strain in metasomatic hydrochemical systems: Results on weathering and pedogenesis , 1987 .

[11]  Richard Han,et al.  Perspectives on next‐generation technology for environmental sensor networks , 2010 .

[12]  A. E. Greenberg,et al.  Standard methods for the examination of water and wastewater : supplement to the sixteenth edition , 1988 .

[13]  Carole L. Palmer,et al.  Documenting provenance in noncomputational workflows: Research process models based on geobiology fieldwork in Yellowstone National Park , 2018, J. Assoc. Inf. Sci. Technol..

[14]  Division on Earth Assuring Data Quality at U.S. Geological Survey Laboratories , 2019 .

[15]  Kerstin Lehnert,et al.  An Ontology Driven Relational Geochemical Database for the Earth's Critical Zone: CZchemDB , 2014 .

[16]  Ian,et al.  Commission for the Management & Application of Geoscience Information (CGI) , 2006 .

[17]  R. Reynolds,et al.  The NCEP/NCAR 40-Year Reanalysis Project , 1996, Renewable Energy.

[18]  Clare L S Wiseman,et al.  Analytical methods for assessing metal bioaccessibility in airborne particulate matter: A scoping review. , 2015, Analytica chimica acta.

[19]  Elizabeth D. Dalton,et al.  Changes in Data Sharing and Data Reuse Practices and Perceptions among Scientists Worldwide , 2015, PloS one.

[20]  Jennifer Wei,et al.  Creating Data Tool Kits That Everyone Can Use , 2020 .

[21]  P. Howarth,et al.  Cigarette Smoke Causes Caspase-Independent Apoptosis of Bronchial Epithelial Cells from Asthmatic Donors , 2015, PloS one.

[22]  Michael F. Hochella,et al.  Natural, incidental, and engineered nanomaterials and their impacts on the Earth system , 2019, Science.

[23]  Lynn Yarmey,et al.  Make scientific data FAIR , 2019, Nature.

[24]  Sarah Callaghan,et al.  Joint declaration of data citation principles , 2014 .

[25]  Erik Schultes,et al.  The FAIR Guiding Principles for scientific data management and stewardship , 2016, Scientific Data.

[26]  Denise M. Argue,et al.  Challenges with secondary use of multi-source water-quality data in the United States. , 2017, Water research.

[27]  F. Arnaud,et al.  From core referencing to data re-use: two French national initiatives to reinforce paleodata stewardship (National Cyber Core Repository and LTER France Retro-Observatory) , 2017 .

[28]  Kristin Vanderbilt,et al.  Completing the data life cycle: using information management in macrosystems ecology research , 2014 .

[29]  Guanjie Zheng,et al.  Detecting anomalous methane in groundwater within hydrocarbon production areas across the United States. , 2021, Water research.

[30]  Lutz Breuer,et al.  Critical issues with cryogenic extraction of soil water for stable isotope analysis , 2016 .

[31]  Lingzhou Xue,et al.  Assessing changes in groundwater chemistry in landscapes with more than 100 years of oil and gas development. , 2019, Environmental science. Processes & impacts.

[32]  Inez Y. Fung,et al.  Controls on solute concentration‐discharge relationships revealed by simultaneous hydrochemistry observations of hillslope runoff and stream flow: The importance of critical zone structure , 2017 .

[33]  Carole L. Palmer,et al.  Site-based data curation based on hot spring geobiology , 2017, PloS one.

[34]  Marcia McNutt,et al.  Data sharing , 2016, Science.

[35]  Abby J. Kinchy,et al.  Barriers to sharing water quality data: experiences from the Shale Network , 2017 .

[36]  Maarten V. de Hoop,et al.  Machine learning for data-driven discovery in solid Earth geoscience , 2019, Science.

[37]  R. Blom,et al.  A remote sensing approach to alteration mapping: AVIRIS data and extension-related potassium metasomatism, Socorro, New Mexico , 1997 .

[38]  Jeffery S. Horsburgh,et al.  Components of an environmental observatory information system , 2011, Comput. Geosci..

[39]  Daniel H. Rothman,et al.  Mineral protection regulates long-term global preservation of natural organic carbon , 2019, Nature.

[40]  Tao Wen,et al.  Three Principles to Use in Streamlining Water Quality Research through Data Uniformity. , 2019, Environmental science & technology.

[41]  Michael Fleischer Glossary of mineral species , 1987 .

[42]  Candie C. Wilderman,et al.  Engaging over data on fracking and water quality , 2018, Science.

[43]  Gavin Sherlock,et al.  Funding high-throughput data sharing , 2004, Nature Biotechnology.

[44]  J. Lynch,et al.  What Goes Up Must Come Down: Integrating Air and Water Quality Monitoring for Nutrients. , 2018, Environmental science & technology.