NSF FAIR Chemical Data Publishing Guidelines Workshop on Chemical Structures and Spectra: Major Outcomes and Outlooks for the Chemistry Community

The National Science Foundation Office of Advanced Cyberinfrastructure (NSF-OAC) funded a workshop in March 2019 focused on advancing the sharing of machine-readable chemical structures and spectra. Around 40 stakeholders from the chemistry, chemical information, and software communities took part in the two-day workshop entitled “FAIR Chemical Data Publishing Guidelines for Chemical Structures and Spectra.” Major topics discussed included publishing data workflows and guidelines, FAIR criteria/metadata profiles, value propositions, a publisher implementation pilot, and community support and engagement. This report summarizes the workshop conversations, major outcomes, and target areas for further activities. Primary outcomes from the workshop include identification of key metadata elements for sharing machine-readable structures and spectra, a sample of concise author guidelines, and a publisher proposal to accept enhanced supporting information files including these data types and associated metadata alongside articles. Selected target areas for further activities include the creation of author file and metadata packaging tools to facilitate easy compilation of data, and increased training for stakeholders specifically in the generation and handling of machine-readable file formats. We conclude this report with our outlooks and highlight several related community efforts initiated after the workshop.

[1]  Arthur Dalby,et al.  Description of several chemical structure file formats used by computer programs developed at Molecular Design Limited , 1992, J. Chem. Inf. Comput. Sci..

[2]  Antony J. Williams,et al.  Machines first, humans second: on the importance of algorithmic interpretation of open chemistry data , 2015, Journal of Cheminformatics.

[3]  Frank H. Allen,et al.  Cambridge Structural Database , 2002 .

[4]  Robert Petryszak,et al.  UniChem: a unified chemical structure cross-referencing and identifier tracking system , 2013, Journal of Cheminformatics.

[5]  Stephen R. Heller,et al.  InChI, the IUPAC International Chemical Identifier , 2015, Journal of Cheminformatics.

[6]  M. Hirai,et al.  MassBank: a public repository for sharing mass spectral data for life sciences. , 2010, Journal of mass spectrometry : JMS.

[7]  Sarah Callaghan,et al.  Making Data a First Class Scientific Output: Data Citation and Publication by NERC's Environmental Data Centres , 2012, Int. J. Digit. Curation.

[8]  P. Lampen,et al.  JCAMP-DX for NMR , 1993 .

[9]  R. Mcdonald,et al.  JCAMP-DX: A Standard Form for Exchange of Infrared Spectra in Computer Readable Form , 1988 .

[10]  Henry S. Rzepa,et al.  Workflows Allowing Creation of Journal Article Supporting Information and Findable, Accessible, Interoperable, and Reusable (FAIR)-Enabled Publication of Spectroscopic Data , 2019, ACS omega.

[11]  C. Southan Opening up connectivity between documents, structures and bioactivity , 2019, Beilstein journal of organic chemistry.

[12]  Erik Schultes,et al.  The FAIR Guiding Principles for scientific data management and stewardship , 2016, Scientific Data.

[13]  Leah McEwen,et al.  Author Guidelines for Sharing Enhanced Supporting Information (Machine-readable Spectra and Chemical Structures) , 2019 .

[14]  A. Valencia,et al.  Information Retrieval and Text Mining Technologies for Chemistry. , 2017, Chemical reviews.

[15]  Scott J. Miller,et al.  Encouraging Submission of FAIR Data at The Journal of Organic Chemistry and Organic Letters. , 2020, The Journal of organic chemistry.

[16]  Jean-Marc Nuzillard,et al.  NMReDATA, a standard to report the NMR assignment and parameters of organic compounds , 2018, Magnetic resonance in chemistry : MRC.

[17]  Nicole Jung,et al.  Chemotion ELN: an Open Source electronic lab notebook for chemists in academia , 2017, Journal of Cheminformatics.

[18]  Christoph Steinbeck,et al.  NMRShiftDB-Constructing a Free Chemical Information System with Open-Source Components , 2003, J. Chem. Inf. Comput. Sci..

[19]  P. Thordarson Determining association constants from titration experiments in supramolecular chemistry. , 2011, Chemical Society reviews.

[20]  Jean-Marc Nuzillard,et al.  Correction: The value of universally available raw NMR data for transparency, reproducibility, and integrity in natural product research. , 2019, Natural product reports.