Solving CASMI 2013 with MetFrag, MetFusion and MOLGEN-MS/MS.

The second Critical Assessment of Small Molecule Identification (CASMI) contest took place in 2013. A joint team from the Swiss Federal Institute of Aquatic Science and Technology (Eawag) and Leibniz Institute of Plant Biochemistry (IPB) participated in CASMI 2013 with an automatic workflow-style entry. MOLGEN-MS/MS was used for Category 1, molecular formula calculation, restricted by the information given for each challenge. MetFrag and MetFusion were used for Category 2, structure identification, retrieving candidates from the compound databases KEGG, PubChem and ChemSpider and joining these lists pre-submission. The results from Category 1 were used to guide whether formula or exact mass searches were performed for Category 2. The Category 2 results were impressive considering the database size and automated regime used, although these could not compete with the manual approach of the contest winner. The Category 1 results were affected by large m/z and ppm values in the challenge data, where strategies beyond pure enumeration from other participants were more successful. However, the combination used for the CASMI 2013 entries was extremely useful for developing decision-making criteria for automatic, high throughput general unknown (non-target) identification and for future contests.

[1]  C. Steinbeck,et al.  Recent developments of the chemistry development kit (CDK) - an open-source java library for chemo- and bioinformatics. , 2006, Current pharmaceutical design.

[2]  Rajarshi Guha,et al.  Chemical Informatics Functionality in R , 2007 .

[3]  Matthias Müller-Hannemann,et al.  In silico fragmentation for computer assisted identification of metabolite mass spectra , 2010, BMC Bioinformatics.

[4]  Martin Krauss,et al.  LC–high resolution MS in environmental analysis: from target screening to the identification of unknowns , 2010, Analytical and bioanalytical chemistry.

[5]  M. Hirai,et al.  MassBank: a public repository for sharing mass spectral data for life sciences. , 2010, Journal of mass spectrometry : JMS.

[6]  Nina Jeliazkova,et al.  AMBIT RESTful web services: an implementation of the OpenTox application programming interface , 2011, J. Cheminformatics.

[7]  Andreas Bender,et al.  Understanding and Classifying Metabolite Space and Metabolite-Likeness , 2011, PloS one.

[8]  Valery Tkachenko,et al.  Identification of “Known Unknowns” Utilizing Accurate Mass Data and ChemSpider , 2011, Journal of The American Society for Mass Spectrometry.

[9]  Markus Meringer,et al.  MS/MS Data Improves Automated Determination of Molecular Formulas by Mass Spectrometry , 2011 .

[10]  Chris Morley,et al.  Open Babel: An open chemical toolbox , 2011, J. Cheminformatics.

[11]  Sanguthevar Rajasekaran,et al.  BioSM: Metabolomics Tool for Identifying Endogenous Mammalian Biochemical Structures in Chemical Structure Space , 2013, J. Chem. Inf. Model..

[12]  Shan He,et al.  CASMI—The Small Molecule Identification Process from a Birmingham Perspective , 2013, Metabolites.

[13]  Sebastian Böcker,et al.  Molecular Formula Identification with SIRIUS , 2013, Metabolites.

[14]  Emma L. Schymanski,et al.  Small Molecule Identification with MOLGEN and Mass Spectrometry , 2013, Metabolites.

[15]  Emma L. Schymanski,et al.  CASMI: And the Winner is .. , 2013, Metabolites.

[16]  Steffen Neumann,et al.  Tackling CASMI 2012: Solutions from MetFrag and MetFusion , 2013, Metabolites.

[17]  Steffen Neumann,et al.  MetFusion: integration of compound identification strategies. , 2013, Journal of mass spectrometry : JMS.

[18]  Susumu Goto,et al.  Data, information, knowledge and principle: back to metabolism in KEGG , 2013, Nucleic Acids Res..