Chemically informed analyses of metabolomics mass spectrometry data with Qemistree

Untargeted mass spectrometry is employed to detect small molecules in complex biospecimens, generating data that are difficult to interpret. We developed Qemistree, a data exploration strategy based on the hierarchical organization of molecular fingerprints predicted from fragmentation spectra. Qemistree allows mass spectrometry data to be represented in the context of sample metadata and chemical ontologies. By expressing molecular relationships as a tree, we can apply ecological tools that are designed to analyze and visualize the relatedness of DNA sequences to metabolomics data. Here we demonstrate the use of tree-guided data exploration tools to compare metabolomics samples across different experimental conditions such as chromatographic shifts. Additionally, we leverage a tree representation to visualize chemical diversity in a heterogeneous collection of samples. The Qemistree software pipeline is freely available to the microbiome and metabolomics communities in the form of a QIIME2 plugin, and a global natural products social molecular networking workflow. Qemistree uses fragmentation spectra to predict molecular fingerprints and represent their relationships as a tree, enabling comparison of metabolomics data across different experimental conditions and exploration of chemical diversity in mixtures.

[1]  Habtom W. Ressom,et al.  Metabolite Identification Using Artificial Neural Network , 2019, 2019 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[2]  Natalie I. Tasman,et al.  A Cross-platform Toolkit for Mass Spectrometry and Proteomics , 2012, Nature Biotechnology.

[3]  Lennart Martens,et al.  mzML—a Community Standard for Mass Spectrometry Data* , 2010, Molecular & Cellular Proteomics.

[4]  Jian Wang,et al.  Assembling the Community-Scale Discoverable Human Proteome , 2018, Cell systems.

[5]  S Joseph Wright,et al.  Sources of variation in foliar secondary chemistry in a tropical forest tree community. , 2017, Ecology.

[6]  Lawrence A. David,et al.  Phylogenetic factorization of compositional data yields lineage-level associations in microbiome datasets , 2017, PeerJ.

[7]  Kristian Fog Nielsen,et al.  Sharing and community curation of mass spectrometry data with Global Natural Products Social Molecular Networking , 2016, Nature Biotechnology.

[8]  Rob Knight,et al.  Metabolome-Informed Microbiome Analysis Refines Metadata Classifications and Reveals Unexpected Medication Transfer in Captive Cheetahs , 2019, mSystems.

[9]  Juho Rousu,et al.  Critical Assessment of Small Molecule Identification 2016: automated methods , 2017, Journal of Cheminformatics.

[10]  Robert R Junker A biosynthetically informed distance measure to compare secondary metabolite profiles , 2017, Chemoecology.

[11]  Matej Oresic,et al.  MZmine 2: Modular framework for processing, visualizing, and analyzing mass spectrometry-based molecular profile data , 2010, BMC Bioinformatics.

[12]  R. Knight,et al.  Species divergence and the measurement of microbial diversity. , 2008, FEMS microbiology reviews.

[13]  Nuno Bandeira,et al.  Mass spectral molecular networking of living microbial colonies , 2012, Proceedings of the National Academy of Sciences.

[14]  Rick L. Stevens,et al.  A communal catalogue reveals Earth’s multiscale microbial diversity , 2017, Nature.

[15]  Rob Knight,et al.  American Gut: an Open Platform for Citizen Science Microbiome Research , 2018, mSystems.

[16]  Sebastian Böcker,et al.  Fragmentation trees reloaded , 2014, Journal of Cheminformatics.

[17]  Rob Knight,et al.  Striped UniFrac: enabling microbiome analysis at unprecedented scale , 2018, Nature Methods.

[18]  Thomas Zichner,et al.  Identifying the unknowns by aligning fragmentation trees. , 2012, Analytical chemistry.

[19]  Peer Bork,et al.  Identification of metabolites from tandem mass spectra with a machine learning approach utilizing structural features , 2019, bioRxiv.

[20]  Juho Rousu,et al.  Metabolite identification and molecular fingerprint prediction through machine learning , 2012, Bioinform..

[21]  P. Bork,et al.  Interactive Tree Of Life (iTOL) v4: recent updates and new developments , 2019, Nucleic Acids Res..

[22]  Rob Knight,et al.  Phylogenetic Placement of Exact Amplicon Sequences Improves Associations with Clinical Information , 2018, mSystems.

[23]  Simon Rogers,et al.  Feature-Based Molecular Networking in the GNPS Analysis Environment , 2019, Nature Methods.

[24]  R. Knight,et al.  Microbial community resemblance methods differ in their ability to detect biologically relevant patterns , 2010, Nature Methods.

[25]  Nuno Bandeira,et al.  Untargeted mass spectrometry-based metabolomics approach unveils molecular changes in raw and processed foods and beverages. , 2020, Food chemistry.

[26]  Sebastian Böcker,et al.  Molecular Formula Identification Using Isotope Pattern Analysis and Calculation of Fragmentation Trees. , 2014, Mass spectrometry.

[27]  Evan Bolton,et al.  ClassyFire: automated chemical classification with a comprehensive, computable taxonomy , 2016, Journal of Cheminformatics.

[28]  Stephen E. Stein,et al.  Metabolite profiling of a NIST Standard Reference Material for human plasma (SRM 1950): GC-MS, LC-MS, NMR, and clinical laboratory analyses, libraries, and web-based resources. , 2013, Analytical chemistry.

[29]  J. T. Curtis,et al.  An Ordination of the Upland Forest Communities of Southern Wisconsin , 1957 .

[30]  Mingxun Wang,et al.  Qiita: rapid, web-enabled microbiome meta-analysis , 2018, Nature Methods.

[31]  Noureddin Sadawi,et al.  ChemDistiller: an engine for metabolite annotation in mass spectrometry , 2018, Bioinform..

[32]  Pierre Champy,et al.  Natural products targeting strategies involving molecular networking: different manners, one goal. , 2019, Natural product reports.

[33]  Joe Wandy,et al.  Topic modeling for untargeted substructure exploration in metabolomics , 2016, Proceedings of the National Academy of Sciences.

[34]  Andrea Porzel,et al.  Discovering Regulated Metabolite Families in Untargeted Metabolomics Studies. , 2016, Analytical chemistry.

[35]  S. Böcker,et al.  Searching molecular structure databases with tandem mass spectra using CSI:FingerID , 2015, Proceedings of the National Academy of Sciences.

[36]  C. Huttenhower,et al.  The microbiome quality control project: baseline study design and future directions , 2015, Genome Biology.

[37]  T. Garland,et al.  Procedures for the Analysis of Comparative Data Using Phylogenetically Independent Contrasts , 1992 .

[38]  Pieter C. Dorrestein,et al.  Database-independent molecular formula annotation using Gibbs sampling through ZODIAC , 2020, Nature Machine Intelligence.

[39]  Juho Rousu,et al.  SIRIUS 4: a rapid tool for turning tandem mass spectra into metabolite structure information , 2019, Nature Methods.

[40]  D. Faith Conservation evaluation and phylogenetic diversity , 1992 .

[41]  Shiv Meka,et al.  Hierarchical clustering of MS/MS spectra from the firefly metabolome identifies new lucibufagin compounds , 2020, Scientific Reports.

[42]  Tobias Depke,et al.  Clustering of MS2 spectra using unsupervised methods to aid the identification of secondary metabolites from Pseudomonas aeruginosa. , 2017, Journal of chromatography. B, Analytical technologies in the biomedical and life sciences.

[43]  Francesco Asnicar,et al.  Reproducible, interactive, scalable and extensible microbiome data science using QIIME 2 , 2019, Nature Biotechnology.

[44]  Peter Willett,et al.  Similarity-based virtual screening using 2D fingerprints. , 2006, Drug discovery today.

[45]  Károly Héberger,et al.  Why is Tanimoto index an appropriate choice for fingerprint-based similarity calculations? , 2015, Journal of Cheminformatics.

[46]  Nigel W. Hardy,et al.  Proposed minimum reporting standards for chemical analysis , 2007, Metabolomics.