Revisiting Five Years of CASMI Contests with EPA Identification Tools

Software applications for high resolution mass spectrometry (HRMS)-based non-targeted analysis (NTA) continue to enhance chemical identification capabilities. Given the variety of available applications, determining the most fit-for-purpose tools and workflows can be difficult. The Critical Assessment of Small Molecule Identification (CASMI) contests were initiated in 2012 to provide a means to evaluate compound identification tools on a standardized set of blinded tandem mass spectrometry (MS/MS) data. Five CASMI contests have resulted in recommendations, publications, and invaluable datasets for practitioners of HRMS-based screening studies. The US Environmental Protection Agency’s (EPA) CompTox Chemicals Dashboard is now recognized as a valuable resource for compound identification in NTA studies. However, this application was too new and immature in functionality to participate in the five previous CASMI contests. In this work, we performed compound identification on all five CASMI contest datasets using Dashboard tools and data in order to critically evaluate Dashboard performance relative to that of other applications. CASMI data was accessed via the CASMI webpage and processed for use in our spectral matching and identification workflow. Relative to applications used by former contest participants, our tools, data, and workflow performed well, placing more challenge compounds in the top five of ranked candidates than did the winners of three contest years and tying in a fourth. In addition, we conducted an in-depth review of the CASMI structure sets and made these reviewed sets available via the Dashboard. Our results suggest that Dashboard data and tools would enhance chemical identification capabilities for practitioners of HRMS-based NTA.

[1]  Juho Rousu,et al.  SIRIUS 4: a rapid tool for turning tandem mass spectra into metabolite structure information , 2019, Nature Methods.

[2]  Andrew D. McEachran,et al.  Linking in silico MS/MS spectra with chemistry data to improve identification of unknowns , 2019, Scientific Data.

[3]  Thomas Letzel,et al.  The strength in numbers: comprehensive characterization of house dust using complementary mass spectrometric techniques , 2019, Analytical and Bioanalytical Chemistry.

[4]  Emma L. Schymanski,et al.  MetFrag relaunched: incorporating strategies beyond in silico fragmentation , 2016, Journal of Cheminformatics.

[5]  Thomas Letzel,et al.  Non-target screening with high-resolution mass spectrometry: critical review using a collaborative trial on water analysis , 2015, Analytical and Bioanalytical Chemistry.

[6]  Kamel Mansouri,et al.  A comparison of three liquid chromatography (LC) retention time prediction models. , 2018, Talanta.

[7]  Andrew D. McEachran,et al.  In silico MS/MS spectra for identifying unknowns: a critical examination using CFM-ID algorithms and ENTACT mixture samples , 2020, Analytical and Bioanalytical Chemistry.

[8]  D. Wishart,et al.  Computational Prediction of Electron Ionization Mass Spectra to Assist in GC/MS Compound Identification. , 2016, Analytical chemistry.

[9]  Antony J. Williams,et al.  The CompTox Chemistry Dashboard: a community data resource for environmental chemistry , 2017, Journal of Cheminformatics.

[10]  Jian Ji,et al.  Software Tools and Approaches for Compound Identification of LC-MS/MS Data in Metabolomics , 2018, Metabolites.

[11]  Dieter Jahn,et al.  Comprehensive comparison of in silico MS/MS fragmentation tools of the CASMI contest: database boosting is needed to achieve 93% accuracy , 2017, Journal of Cheminformatics.

[12]  David S. Wishart,et al.  CFM-ID Applied to CASMI 2014 , 2017 .

[13]  Antony J. Williams,et al.  EPA’s DSSTox database: History of development of a curated chemistry resource supporting computational toxicology research , 2019, Computational toxicology.

[14]  Ann M. Richard,et al.  Using prepared mixtures of ToxCast chemicals to evaluate non-targeted analysis (NTA) method performance , 2019, Analytical and Bioanalytical Chemistry.

[15]  D. Scott,et al.  Optimization and testing of mass spectral library search algorithms for compound identification , 1994, Journal of the American Society for Mass Spectrometry.

[16]  Takaaki Nishioka,et al.  Winners of CASMI2013: Automated Tools and Challenge Data. , 2014, Mass spectrometry.

[17]  Kamel Mansouri,et al.  Suspect screening and non-targeted analysis of drinking water using point-of-use filters. , 2018, Environmental pollution.

[18]  Brian K. Shoichet,et al.  ZINC - A Free Database of Commercially Available Compounds for Virtual Screening , 2005, J. Chem. Inf. Model..

[19]  Antony J. Williams,et al.  “MS-Ready” structures for non-targeted high-resolution mass spectrometry screening studies , 2018, Journal of Cheminformatics.

[20]  Antony J. Williams,et al.  Integrating tools for non-targeted analysis research and chemical safety evaluations at the US EPA , 2017, Journal of Exposure Science & Environmental Epidemiology.

[21]  Emma L. Schymanski,et al.  Identifying small molecules via high resolution mass spectrometry: communicating confidence. , 2014, Environmental science & technology.

[22]  Reza Aalizadeh,et al.  Wide-scope target screening of >2000 emerging contaminants in wastewater samples with UPLC-Q-ToF-HRMS/MS and smart evaluation of its performance through the validation of 195 selected representative analytes. , 2019, Journal of hazardous materials.

[23]  Evan Bolton,et al.  PubChem 2019 update: improved access to chemical data , 2018, Nucleic Acids Res..

[24]  Warwick B. Dunn,et al.  CASMI 2014: Challenges, Solutions and Results , 2017 .

[25]  Robert Kiss,et al.  http://Mcule.com: a public web service for drug discovery , 2012, Journal of Cheminformatics.

[26]  Sebastian Böcker,et al.  Molecular Formula Identification Using Isotope Pattern Analysis and Calculation of Fragmentation Trees. , 2014, Mass spectrometry.

[27]  Torsten C. Schmidt,et al.  Comparison of Software Tools for Liquid Chromatography–High-Resolution Mass Spectrometry Data Processing in Nontarget Screening of Environmental Samples , 2020 .

[28]  Juho Rousu,et al.  Critical Assessment of Small Molecule Identification 2016: automated methods , 2017, Journal of Cheminformatics.

[29]  Emma L. Schymanski,et al.  Nontarget Screening with High Resolution Mass Spectrometry in the Environment: Ready to Go? , 2017, Environmental science & technology.

[30]  Jon R. Sobus,et al.  Identifying known unknowns using the US EPA’s CompTox Chemistry Dashboard , 2017, Analytical and Bioanalytical Chemistry.

[31]  T. Schmidt,et al.  Comparison of Software Tools for LC-HRMS Data Processing in Non-Target Screening of Environmental Samples. , 2019, Analytical chemistry.

[32]  Ann M Richard,et al.  Linking high resolution mass spectrometry data with exposure and toxicity forecasts to advance high-throughput environmental monitoring. , 2016, Environment international.

[33]  Russ Greiner,et al.  Competitive fragmentation modeling of ESI-MS/MS spectra for putative metabolite identification , 2013, Metabolomics.

[34]  Steffen Neumann,et al.  Tackling CASMI 2012: Solutions from MetFrag and MetFusion , 2013, Metabolites.

[35]  Kamel Mansouri,et al.  EPA’s non-targeted analysis collaborative trial (ENTACT): genesis, design, and initial findings , 2018, Analytical and Bioanalytical Chemistry.

[36]  Yann Guitton,et al.  Successes and pitfalls in automated dereplication strategy using liquid chromatography coupled to mass spectrometry data: A CASMI 2016 experience , 2017 .

[37]  David S. Wishart,et al.  CFM-ID: a web server for annotation, spectrum prediction and metabolite identification from tandem mass spectra , 2014, Nucleic Acids Res..

[38]  Emma L. Schymanski,et al.  The Critical Assessment of Small Molecule Identification (CASMI): Challenges Solutions , 2013, Metabolites.