MetFrag relaunched: incorporating strategies beyond in silico fragmentation

BackgroundThe in silico fragmenter MetFrag, launched in 2010, was one of the first approaches combining compound database searching and fragmentation prediction for small molecule identification from tandem mass spectrometry data. Since then many new approaches have evolved, as has MetFrag itself. This article details the latest developments to MetFrag and its use in small molecule identification since the original publication.ResultsMetFrag has gone through algorithmic and scoring refinements. New features include the retrieval of reference, data source and patent information via ChemSpider and PubChem web services, as well as InChIKey filtering to reduce candidate redundancy due to stereoisomerism. Candidates can be filtered or scored differently based on criteria like occurence of certain elements and/or substructures prior to fragmentation, or presence in so-called “suspect lists”. Retention time information can now be calculated either within MetFrag with a sufficient amount of user-provided retention times, or incorporated separately as “user-defined scores” to be included in candidate ranking. The changes to MetFrag were evaluated on the original dataset as well as a dataset of 473 merged high resolution tandem mass spectra (HR-MS/MS) and compared with another open source in silico fragmenter, CFM-ID. Using HR-MS/MS information only, MetFrag2.2 and CFM-ID had 30 and 43 Top 1 ranks, respectively, using PubChem as a database. Including reference and retention information in MetFrag2.2 improved this to 420 and 336 Top 1 ranks with ChemSpider and PubChem (89 and 71 %), respectively, and even up to 343 Top 1 ranks (PubChem) when combining with CFM-ID. The optimal parameters and weights were verified using three additional datasets of 824 merged HR-MS/MS spectra in total. Further examples are given to demonstrate flexibility of the enhanced features.ConclusionsIn many cases additional information is available from the experimental context to add to small molecule identification, which is especially useful where the mass spectrum alone is not sufficient for candidate selection from a large number of candidates. The results achieved with MetFrag2.2 clearly show the benefit of considering this additional information. The new functions greatly enhance the chance of identification success and have been incorporated into a command line interface in a flexible way designed to be integrated into high throughput workflows. Feedback on the command line version of MetFrag2.2 available at http://c-ruttkies.github.io/MetFrag/ is welcome.

[1]  Luhua Lai,et al.  A New Atom-Additive Method for Calculating Partition Coefficients , 1997, J. Chem. Inf. Comput. Sci..

[2]  Martin Krauss,et al.  Identification of novel micropollutants in wastewater by a combination of suspect and nontarget screening. , 2014, Environmental pollution.

[3]  中尾 光輝,et al.  KEGG(Kyoto Encyclopedia of Genes and Genomes)〔和文〕 (特集 ゲノム医学の現在と未来--基礎と臨床) -- (データベース) , 2000 .

[4]  Oliver Fiehn,et al.  MS2Analyzer: A Software for Small Molecule Substructure Annotations from Accurate Tandem Mass Spectra , 2014, Analytical chemistry.

[5]  Emma L. Schymanski,et al.  Strategies to characterize polar organic contamination in wastewater: exploring the capability of high resolution mass spectrometry. , 2014, Environmental science & technology.

[6]  Juho Rousu,et al.  Metabolite identification and molecular fingerprint prediction through machine learning , 2012, Bioinform..

[7]  Valery Tkachenko,et al.  Identification of “Known Unknowns” Utilizing Accurate Mass Data and ChemSpider , 2011, Journal of The American Society for Mass Spectrometry.

[8]  Takaaki Nishioka,et al.  Winners of CASMI2013: Automated Tools and Challenge Data. , 2014, Mass spectrometry.

[9]  Lubertus Bijlsma,et al.  Critical evaluation of a simple retention time predictor based on LogKow as a complementary tool in the identification of emerging contaminants in water. , 2015, Talanta.

[10]  David S. Wishart,et al.  HMDB 3.0—The Human Metabolome Database in 2013 , 2012, Nucleic Acids Res..

[11]  A. Hogenboom,et al.  Accurate mass screening and identification of emerging contaminants in environmental samples by liquid chromatography-hybrid linear ion trap Orbitrap mass spectrometry. , 2009, Journal of chromatography. A.

[12]  Kiyoko F. Aoki-Kinoshita,et al.  From genomics to chemical genomics: new developments in KEGG , 2005, Nucleic Acids Res..

[13]  Stacy D. Brown,et al.  Identification of “Known Unknowns” Utilizing Accurate Mass Data and Chemical Abstracts Service Databases , 2011, Journal of the American Society for Mass Spectrometry.

[14]  Steffen Neumann,et al.  Annotation of metabolites from gas chromatography/atmospheric pressure chemical ionization tandem mass spectrometry data using an in silico generated compound database and MetFrag. , 2015, Rapid communications in mass spectrometry : RCM.

[15]  A. Leo CALCULATING LOG POCT FROM STRUCTURES , 1993 .

[16]  S. Böcker,et al.  Searching molecular structure databases with tandem mass spectra using CSI:FingerID , 2015, Proceedings of the National Academy of Sciences of the United States of America.

[17]  C. Steinbeck,et al.  Recent developments of the chemistry development kit (CDK) - an open-source java library for chemo- and bioinformatics. , 2006, Current pharmaceutical design.

[18]  Yuan Zhao,et al.  Computation of Octanol-Water Partition Coefficients by Guiding an Additive Model with Knowledge , 2007, J. Chem. Inf. Model..

[19]  Stephen Stein,et al.  Mass spectral reference libraries: an ever-expanding resource for chemical identification. , 2012, Analytical chemistry.

[20]  R. Mannhold,et al.  Calculation of molecular lipophilicity: state of the art and comparison of methods on more than 96000 compounds , 2009, Journal of pharmaceutical sciences.

[21]  Leon P Barron,et al.  Prediction of chromatographic retention time in high-resolution anti-doping screening data using artificial neural networks. , 2013, Analytical chemistry.

[22]  Russ Greiner,et al.  Competitive fragmentation modeling of ESI-MS/MS spectra for putative metabolite identification , 2013, Metabolomics.

[23]  Lars Ridder,et al.  Substructure-based annotation of high-resolution multistage MS(n) spectral trees. , 2012, Rapid communications in mass spectrometry : RCM.

[24]  Emma L. Schymanski,et al.  CASMI: And the Winner is .. , 2013, Metabolites.

[25]  Emma L. Schymanski,et al.  Suspect and nontarget screening approaches to identify organic contaminant records in lake sediments , 2014, Analytical and Bioanalytical Chemistry.

[26]  Lars Ridder,et al.  Automatic Compound Annotation from Mass Spectrometry Data Using MAGMa. , 2014, Mass spectrometry.

[27]  Adalbert Kerber,et al.  CASE via MS: Ranking Structure Candidates by Mass Spectra , 2006 .

[28]  P. de Voogt,et al.  Accurate mass screening and identification of emerging contaminants in environmental samples by liquid chromatography-LTQ FT Orbitrap mass spectrometry , 2008 .

[29]  Emma L. Schymanski,et al.  Matching structures to mass spectra using fragmentation patterns: are the results as good as they look? , 2009, Analytical chemistry.

[30]  Thomas Letzel,et al.  Non-target screening with high-resolution mass spectrometry: critical review using a collaborative trial on water analysis , 2015, Analytical and Bioanalytical Chemistry.

[31]  Werner Brack,et al.  Linear Solvation Energy Relationships as classifiers in non-target analysis--a capillary liquid chromatography approach. , 2011, Journal of chromatography. A.

[32]  R. Abagyan,et al.  XCMS: processing mass spectrometry data for metabolite profiling using nonlinear peak alignment, matching, and identification. , 2006, Analytical chemistry.

[33]  Emma L. Schymanski,et al.  Mass spectral databases for LC/MS- and GC/MS-based metabolomics: state of the field and future prospects , 2016 .

[34]  W. J. Dunn,et al.  Partition coefficient: Determination and estimation , 1986 .

[35]  Steffen Neumann,et al.  MetFusion: integration of compound identification strategies. , 2013, Journal of mass spectrometry : JMS.

[36]  Karl Fraser,et al.  Predicting retention time in hydrophilic interaction liquid chromatography mass spectrometry and its use for peak annotation in metabolomics , 2014, Metabolomics.

[37]  M. Hirai,et al.  MassBank: a public repository for sharing mass spectral data for life sciences. , 2010, Journal of mass spectrometry : JMS.

[38]  R. Friedman,et al.  Mass spectral metabonomics beyond elemental formula: chemical database querying by matching experimental with computational fragmentation spectra. , 2008, Analytical chemistry.

[39]  Emma L. Schymanski,et al.  Automatic recalibration and processing of tandem mass spectra using formula annotation. , 2013, Journal of mass spectrometry : JMS.

[40]  Stephen R. Heller,et al.  InChI - the worldwide chemical structure identifier standard , 2013, Journal of Cheminformatics.

[41]  René P Schwarzenbach,et al.  Identification of transformation products of organic contaminants in natural waters by computer-aided prediction and high-resolution mass spectrometry. , 2009, Environmental science & technology.

[42]  Egon L. Willighagen,et al.  The Chemistry Development Kit (CDK): An Open-Source Java Library for Chemo-and Bioinformatics , 2003, J. Chem. Inf. Comput. Sci..

[43]  Martin Krauss,et al.  Consensus structure elucidation combining GC/EI-MS, structure generation, and calculated properties. , 2012, Analytical chemistry.

[44]  Matthias Müller-Hannemann,et al.  In silico fragmentation for computer assisted identification of metabolite mass spectra , 2010, BMC Bioinformatics.

[45]  E. Kováts,et al.  Gas‐chromatographische Charakterisierung organischer Verbindungen. Teil 1: Retentionsindices aliphatischer Halogenide, Alkohole, Aldehyde und Ketone , 1958 .