MetGem Software for the Generation of Molecular Networks Based on the t-SNE Algorithm.

Molecular networking (MN) is becoming a standard bioinformatics tool in the metabolomic community. Its paradigm is based on the observation that compounds with a high degree of chemical similarity share comparable MS2 fragmentation pathways. To afford a clear separation between MS2 spectral clusters, only the most relevant similarity scores are selected using dedicated filtering steps requiring time-consuming parameter optimization. Depending on the filtering values selected, some scores are arbitrarily deleted and a part of the information is ignored. The problem of creating a reliable representation of MS2 spectra data sets can be solved using algorithms developed for dimensionality reduction and pattern recognition purposes, such as t-distributed stochastic neighbor embedding (t-SNE). This multivariate embedding method pays particular attention to local details by using nonlinear outputs to represent the entire data space. To overcome the limitations inherent to the GNPS workflow and the networking architecture, we developed MetGem. Our software allows the parallel investigation of two complementary representations of the raw data set, one based on a classic GNPS-style MN and another based on the t-SNE algorithm. The t-SNE graph preserves the interactions between related groups of spectra, while the MN output allows an unambiguous separation of clusters. Additionally, almost all parameters can be tuned in real time, and new networks can be generated within a few seconds for small data sets. With the development of this unified interface ( https://metgem.github.io ), we fulfilled the need for a dedicated, user-friendly, local software for MS2 comparison and spectral network generation.

[1]  Mingxun Wang,et al.  Propagating annotations of molecular networks using in silico fragmentation , 2018, PLoS Comput. Biol..

[2]  A. Regev,et al.  Revealing the vectors of cellular identity with single-cell genomics , 2016, Nature Biotechnology.

[3]  Ahmed Mahfouz,et al.  Visualizing the spatial gene expression organization in the brain through non-linear similarity embeddings. , 2015, Methods.

[4]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[5]  Luis Pizarro,et al.  Hyperspectral visualization of mass spectrometry imaging data. , 2013, Analytical chemistry.

[6]  Kristian Fog Nielsen,et al.  Sharing and community curation of mass spectrometry data with Global Natural Products Social Molecular Networking , 2016, Nature Biotechnology.

[7]  Masanori Arita,et al.  Identification of small molecules using accurate mass MS/MS search. , 2018, Mass spectrometry reviews.

[8]  I. Revelsky,et al.  Evaluation of mass spectral library search algorithms implemented in commercial software. , 2015, Journal of mass spectrometry : JMS.

[9]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[10]  Sean C. Bendall,et al.  viSNE enables visualization of high dimensional single-cell data and reveals phenotypic heterogeneity of leukemia , 2013, Nature Biotechnology.

[11]  Martin Wattenberg,et al.  How to Use t-SNE Effectively , 2016 .

[12]  Paul Shannon,et al.  CyREST: Turbocharging Cytoscape Access for External Tools via a RESTful API , 2015, F1000Research.

[13]  M. Jacomy,et al.  ForceAtlas2, a Continuous Graph Layout Algorithm for Handy Network Visualization Designed for the Gephi Software , 2014, PloS one.

[14]  Hosein Mohimani,et al.  Increased diversity of peptidic natural products revealed by modification-tolerant database search of mass spectra , 2018, Nature Microbiology.

[15]  Nuno Bandeira,et al.  Mass spectral molecular networking of living microbial colonies , 2012, Proceedings of the National Academy of Sciences.

[16]  D. Scott,et al.  Optimization and testing of mass spectral library search algorithms for compound identification , 1994, Journal of the American Society for Mass Spectrometry.

[17]  Boudewijn P F Lelieveldt,et al.  Data-driven identification of prognostic tumor subpopulations using spatially mapped t-SNE of mass spectrometry imaging data , 2016, Proceedings of the National Academy of Sciences.

[18]  Natalie I. Tasman,et al.  A Cross-platform Toolkit for Mass Spectrometry and Proteomics , 2012, Nature Biotechnology.

[19]  Shuiwang Ji Computational genetic neuroanatomy of the developing mouse brain: dimensionality reduction, visualization, and clustering , 2013, BMC Bioinformatics.

[20]  Marc Litaudon,et al.  MZmine 2 Data-Preprocessing To Enhance Molecular Networking Reliability. , 2017, Analytical chemistry.

[21]  Matej Oresic,et al.  MZmine 2: Modular framework for processing, visualizing, and analyzing mass spectrometry-based molecular profile data , 2010, BMC Bioinformatics.

[22]  Gintaras Deikus,et al.  Metagenomic binning and association of plasmids with bacterial host genomes using DNA methylation , 2017, Nature Biotechnology.

[23]  C. Pannecouque,et al.  Antiviral Activity of Flexibilane and Tigliane Diterpenoids from Stillingia lineata. , 2015, Journal of natural products.

[24]  Mikhail Belkin,et al.  Laplacian Eigenmaps for Dimensionality Reduction and Data Representation , 2003, Neural Computation.

[25]  Marc Litaudon,et al.  Optimized experimental workflow for tandem mass spectrometry molecular networking in metabolomics , 2017, Analytical and Bioanalytical Chemistry.

[26]  Shuzhao Li,et al.  One Step Forward for Reducing False Positive and False Negative Compound Identifications from Mass Spectrometry Metabolomics Data: New Algorithms for Constructing Extracted Ion Chromatograms and Detecting Chromatographic Peaks. , 2017, Analytical chemistry.