Evaluation of an Artificial Neural Network Retention Index Model for Chemical Structure Identification in Nontargeted Metabolomics.

Liquid chromatography coupled with electrospray ionization tandem mass spectrometry (LC-ESI-MS/MS) is a major analytical technique used for nontargeted identification of metabolites in biological fluids. Typically, in LC-ESI-MS/MS based database assisted structure elucidation pipelines, the exact mass of an unknown compound is used to mine a chemical structure database to acquire an initial set of possible candidates. Subsequent matching of the collision induced dissociation (CID) spectrum of the unknown to the CID spectra of candidate structures facilitates identification. However, this approach often fails because of the large numbers of potential candidates (i.e., false positives) for which CID spectra are not available. To overcome this problem, CID fragmentation predication programs have been developed, but these also have limited success if large numbers of isomers with similar CID spectra are present in the candidate set. In this study, we investigated the use of a retention index (RI) predictive model as an orthogonal method to help improve identification rates. The model was used to eliminate candidate structures whose predicted RI values differed significantly from the experimentally determined RI value of the unknown compound. We tested this approach using a set of ninety-one endogenous metabolites and four in silico CID fragmentation algorithms: CFM-ID, CSI:FingerID, Mass Frontier, and MetFrag. Candidate sets obtained from PubChem and the Human Metabolite Database (HMDB) were ranked with and without RI filtering followed by in silico spectral matching. Upon RI filtering, 12 of the ninety-one metabolites were eliminated from their respective candidate sets, i.e., were scored incorrectly as negatives. For the remaining seventy-nine compounds, we show that RI filtering eliminated an average of 58% from PubChem candidate sets. This resulted in an approximately 2-fold improvement in average rankings when using CFM-ID, Mass Frontier, and MetFrag. In addition, RI filtering slightly increased the occurrence of number one rankings for all 4 fragmentation algorithms. However, RI filtering did not significantly improve average rankings when HMDB was used as the candidate database, nor did it significantly improve average rankings when using CSI:FingerID. Overall, we show that the current RI model incorrectly eliminated more true positives (12) than were expected (4-5) on the basis of the filtering method. However, it slightly improved the number of correct first place rankings and improved overall average rankings when using CFM-ID, Mass Frontier, and MetFrag.

[1]  Ming-Hui Chen,et al.  Correction of precursor and product ion relative abundances in order to standardize CID spectra and improve Ecom50 accuracy for non-targeted metabolomics , 2015, Metabolomics.

[2]  B. Bowen,et al.  MIDAS: a database-searching algorithm for metabolite identification in metabolomics. , 2014, Analytical chemistry.

[3]  Ming-Hui Chen,et al.  Development of a Reverse Phase HPLC Retention Index Model for Nontargeted Metabolomics Using Synthetic Compounds , 2018, J. Chem. Inf. Model..

[4]  R. Friedman,et al.  Mass spectral metabonomics beyond elemental formula: chemical database querying by matching experimental with computational fragmentation spectra. , 2008, Analytical chemistry.

[5]  Emma L. Schymanski,et al.  MetFrag relaunched: incorporating strategies beyond in silico fragmentation , 2016, Journal of Cheminformatics.

[6]  Michael I. Jordan,et al.  Machine learning: Trends, perspectives, and prospects , 2015, Science.

[7]  Juho Rousu,et al.  Critical Assessment of Small Molecule Identification 2016: automated methods , 2017, Journal of Cheminformatics.

[8]  Bernd Markus Lange,et al.  Open-Access Metabolomics Databases for Natural Product Research: Present Capabilities and Future Potential , 2015, Front. Bioeng. Biotechnol..

[9]  Sebastian Böcker,et al.  Mining molecular structure databases: Identification of small molecules based on fragmentation mass spectrometry data. , 2017, Mass spectrometry reviews.

[10]  Mei-Yi Zhang,et al.  Hybrid triple quadrupole-linear ion trap mass spectrometry in fragmentation mechanism studies: application to structure elucidation of buspirone and one of its metabolites. , 2005, Journal of mass spectrometry : JMS.

[11]  Matthias Müller-Hannemann,et al.  In silico fragmentation for computer assisted identification of metabolite mass spectra , 2010, BMC Bioinformatics.

[12]  Zhen Ji,et al.  HAMMER: automated operation of mass frontier to construct in silico mass spectral fragmentation libraries , 2013, Bioinform..

[13]  Sebastian Böcker,et al.  Computational mass spectrometry for small molecules , 2013, Journal of Cheminformatics.

[14]  Steven Lai,et al.  MolFind: a software package enabling HPLC/MS-based identification of unknown chemical structures. , 2012, Analytical chemistry.

[15]  Evan Bolton,et al.  PubChem3D: a new resource for scientists , 2011, J. Cheminformatics.

[16]  David S. Wishart,et al.  CFM-ID: a web server for annotation, spectrum prediction and metabolite identification from tandem mass spectra , 2014, Nucleic Acids Res..

[17]  Sebastian Böcker,et al.  Computational mass spectrometry for small-molecule fragmentation , 2014 .

[18]  R. Abagyan,et al.  METLIN: A Metabolite Mass Spectral Database , 2005, Therapeutic drug monitoring.

[19]  W. Weckwerth,et al.  Metabolomics in practice : successful strategies to generate and analyze metabolic data , 2013 .

[20]  Juho Rousu,et al.  Metabolite identification through multiple kernel learning on fragmentation trees , 2014, Bioinform..

[21]  M. Hirai,et al.  MassBank: a public repository for sharing mass spectral data for life sciences. , 2010, Journal of mass spectrometry : JMS.

[22]  Russ Greiner,et al.  Competitive fragmentation modeling of ESI-MS/MS spectra for putative metabolite identification , 2013, Metabolomics.

[23]  Iain Beattie,et al.  Ultra-performance liquid chromatography coupled to quadrupole-orthogonal time-of-flight mass spectrometry. , 2004, Rapid communications in mass spectrometry : RCM.

[24]  David S. Wishart,et al.  Development of Ecom50 and Retention Index Models for Nontargeted Metabolomics: Identification of 1, 3-Dicyclohexylurea in Human Serum by HPLC/Mass Spectrometry , 2012, J. Chem. Inf. Model..

[25]  Ming-Hui Chen,et al.  Optimizing artificial neural network models for metabolomics and systems biology: an example using HPLC retention index data. , 2015, Bioanalysis.

[26]  Juho Rousu,et al.  Metabolite identification and molecular fingerprint prediction through machine learning , 2012, Bioinform..

[27]  S. Böcker,et al.  Searching molecular structure databases with tandem mass spectra using CSI:FingerID , 2015, Proceedings of the National Academy of Sciences of the United States of America.

[28]  Boris L. Milman,et al.  Mass spectral libraries: A statistical review of the visible use , 2016 .