Algorithmic Learning for Auto-deconvolution of GC-MS Data to Enable Molecular Networking within GNPS

Gas chromatography-mass spectrometry (GC-MS) represents an analytical technique with significant practical societal impact. Spectral deconvolution is an essential step for interpreting GC-MS data. No public GC-MS repositories that also enable repository-scale analysis exist, in part because deconvolution requires significant user input. We therefore engineered a scalable machine learning workflow for the Global Natural Product Social Molecular Networking (GNPS) analysis platform to enable the mass spectrometry community to store, process, share, annotate, compare, and perform molecular networking of GC-MS data. The workflow performs auto-deconvolution of compound fragmentation patterns via unsupervised non-negative matrix factorization, using a Fast Fourier Transform-based strategy to overcome scalability limitations. We introduce a “balance score” that quantifies the reproducibility of fragmentation patterns across all samples. We demonstrate the utility of the platform with breathomics analysis applied to the early detection of oesophago-gastric cancer, and by creating the first molecular spatial map of the human volatilome.

Wout Bittremieux | Alexander A. Aksenov | Yann Guitton | Mingxun Wang | Pieter C. Dorrestein | Zheng Zhang | Thomas O. Metz | Madeleine Ernst | James Morton | Itzhak Mizrahi | Ivan Laponogov | Louis Felix Nothias | Daniel Petras | Reza Mirnezami | James T. Morton | Sophie LF Doran | Ilaria Belluomo | Dennis Veselkov | Mélissa Nothias-Esposito | Katherine N. Maloney | Biswapriya B. Misra | Alexey V. Melnik | Kenneth L. Jones | Kathleen Dorrestein | Morgan Panitchpakdi | Justin J.J. van der Hooft | Mabel Gonzalez | Chiara Carazzone | Adolfo Amézquita | Chris Callewaert | Robert Quinn | Amina Bouslimani | Andrea Albarracín Orio | Andrea M. Smania | Sneha P. Couvillion | Meagan C. Burnet | Carrie D. Nicora | Erika Zink | Viatcheslav Artaev | Elizabeth Humston-Fulmer | Rachel Gregor | Michael M. Meijler | Stav Eyal | Brooke Anderson | Rachel Dutton | Raphaël Lugan | Pauline Le Boulch | Stephanie Prevost | Audrey Poirier | Gaud Dervilly | Bruno Le Bizec | Aaron Fait | Noga Sikron Persi | Chao Song | Kelem Gashu | Roxana Coras | Monica Guma | Julia Manasson | Jose U. Scher | Dinesh Barupal | Saleh Alseekh | Alisdair Fernie | Vasilis Vasiliou | Robin Schmid | Roman S. Borisov | Larisa N. Kulikova | Rob Knight | George B Hanna | Kirill Veselkov | R. Knight | P. Dorrestein | G. Hanna | I. Laponogov | K. Veselkov | V. Vasiliou | A. Fernie | I. Mizrahi | D. Barupal | W. Bittremieux | T. Metz | A. Fait | R. Borisov | C. Nicora | Y. Guitton | J. Scher | G. Dervilly | A. Aksenov | A. Melnik | M. Meijler | R. Dutton | J. V. D. van der Hooft | R. Mirnezami | Madeleine Ernst | C. Callewaert | R. Quinn | D. Petráš | Stav Eyal | Amina Bouslimani | Kathleen Dorrestein | A. Amézquita | K. Maloney | B. Misra | L. Nothias | Mingxun Wang | J. Manasson | S. Alseekh | S. Doran | B. Bizec | E. Humston-Fulmer | R. Lugan | A. Smania | Mélissa Nothias-Esposito | Zheng Zhang | Erika M. Zink | D. Veselkov | M. Guma | S. Prévost | Kelem Gashu | V. Artaev | R. Coras | Chao Song | C. Carazzone | I. Belluomo | R. Gregor | Robin Schmid | Morgan Panitchpakdi | Mabel Gonzalez | L. Kulikova | A. Poirier | A. A. albarracín Orio | M. Panitchpakdi | Brooke Anderson | P. L. Boulch | Wout Bittremieux | R. Knight | Louis-Félix Nothias

[1]  Stephen Stein,et al.  Mass spectral reference libraries: an ever-expanding resource for chemical identification. , 2012, Analytical chemistry.

[2]  Theodore Alexandrov,et al.  3D molecular cartography using LC–MS facilitated by Optimus and 'ili software , 2017, Nature Protocols.

[3]  C. Lintas,et al.  GC-MS characterization and quantification of sterols and cholesterol oxidation products , 1993 .

[4]  María Blanca Fernández-Viñéa CURRENT STATUS AND FUTURE PERSPECTIVES , 2018 .

[5]  R. Bro,et al.  Solving GC-MS problems with PARAFAC2 , 2008 .

[6]  Wei Zhang,et al.  Assessment of ovarian cancer conditions from exhaled breath , 2015, International journal of cancer.

[7]  H. Sobhi,et al.  Advances in Fatty Acid Analysis for Clinical Investigation and Diagnosis using GC/MS Methodology , 2018 .

[8]  M. Jacomy,et al.  ForceAtlas2, a Continuous Graph Layout Algorithm for Handy Network Visualization Designed for the Gephi Software , 2014, PloS one.

[9]  Nigel W. Hardy,et al.  Proposed minimum reporting standards for chemical analysis , 2007, Metabolomics.

[10]  O. Fiehn,et al.  FiehnLib: mass spectral and retention index libraries for metabolomics based on quadrupole and time-of-flight gas chromatography/mass spectrometry. , 2009, Analytical chemistry.

[11]  M. Hirai,et al.  MassBank: a public repository for sharing mass spectral data for life sciences. , 2010, Journal of mass spectrometry : JMS.

[12]  Alexander Goesmann,et al.  MeltDB 2.0–advances of the metabolomics software system , 2013, Bioinform..

[13]  S. Stein An integrated method for spectrum extraction and compound identification from gas chromatography/mass spectrometry data , 1999 .

[14]  M. Phillips,et al.  Alveolar gradient of pentane in normal human breath. , 1994, Free radical research.

[15]  Olaf Tietje,et al.  Prediction of breast cancer using volatile biomarkers in the breath , 2006, Breast Cancer Research and Treatment.

[16]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[17]  Mingxun Wang,et al.  Qiita: rapid, web-enabled microbiome meta-analysis , 2018, Nature Methods.

[18]  E. Mezzina,et al.  AIST:RIO-DB Spectral Database for Organic Compounds,SDBS , 2009 .

[19]  K. Varmuza,et al.  Spectral similarity versus structural similarity: infrared spectroscopy , 2003 .

[20]  Andreas Natsch,et al.  A functional ABCC11 allele is essential in the biochemical formation of human axillary odor. , 2010, The Journal of investigative dermatology.

[21]  Jonathan Masci,et al.  Geometric deep learning , 2016, SIGGRAPH ASIA Courses.

[22]  Hossam Haick,et al.  Breath analysis of cancer in the present and the future , 2019, European Respiratory Review.

[23]  Kristian Fog Nielsen,et al.  Sharing and community curation of mass spectrometry data with Global Natural Products Social Molecular Networking , 2016, Nature Biotechnology.

[24]  Joe Wandy,et al.  MolNetEnhancer: enhanced molecular networks by integrating metabolome mining and annotation tools , 2019 .

[25]  Fred W. McLafferty,et al.  Adding forward searching capabilities to a reverse search algorithm for unknown mass spectra , 1985 .

[26]  Mark P. Styczynski,et al.  Systematic identification of conserved metabolites in GC/MS data for metabolomics and biomarker discovery. , 2007, Analytical chemistry.

[27]  David Touboul,et al.  Generation of molecular network from electron ionization mass spectrometry data by combining MzMine2 and MetGem software. , 2019, Analytical chemistry.

[28]  David S. Wishart,et al.  HMDB 4.0: the human metabolome database for 2018 , 2017, Nucleic Acids Res..

[29]  David Smith,et al.  Mass Spectrometric Analysis of Exhaled Breath for the Identification of Volatile Organic Compound Biomarkers in Esophageal and Gastric Adenocarcinoma , 2015, Annals of surgery.

[30]  Ruth D. Coldwell,et al.  Gas chromatography—mass spectrometry and the measurement of vitamin D metabolites in human serum or plasma , 1987, Steroids.

[31]  Dirk Walther,et al.  Mass spectral search and analysis using the Golm Metabolome Database , 2012 .

[32]  Masanori Arita,et al.  MS-DIAL: Data Independent MS/MS Deconvolution for Comprehensive Metabolome Analysis , 2015, Nature Methods.

[33]  P. Shannon,et al.  Cytoscape: a software environment for integrated models of biomolecular interaction networks. , 2003, Genome research.

[34]  Arjen Lommen,et al.  MetAlign 3.0: performance enhancement by efficient use of advances in computer hardware , 2011, Metabolomics.

[35]  Matej Oresic,et al.  MZmine 2: Modular framework for processing, visualizing, and analyzing mass spectrometry-based molecular profile data , 2010, BMC Bioinformatics.

[36]  Kevin Gleeson,et al.  Detection of lung cancer with volatile markers in the breath. , 2003, Chest.

[37]  R. Cataneo,et al.  Volatile organic compounds in breath as markers of lung cancer: a cross-sectional study , 1999, The Lancet.

[38]  Massimo Corradi,et al.  Determination of aldehydes in exhaled breath of patients with lung cancer by means of on-fiber-derivatisation SPME-GC/MS. , 2010, Journal of chromatography. B, Analytical technologies in the biomedical and life sciences.

[39]  Q. P. He,et al.  Self-Calibrated Warping for Mass Spectra Alignment , 2011, Cancer informatics.

[40]  Francesco Asnicar,et al.  Reproducible, interactive, scalable and extensible microbiome data science using QIIME 2 , 2019, Nature Biotechnology.

[41]  Jianguo Xia,et al.  Metabolomic Data Processing, Analysis, and Interpretation Using MetaboAnalyst , 2011, Current protocols in bioinformatics.

[42]  P. Dorrestein,et al.  The spectral networks paradigm in high throughput mass spectrometry. , 2012, Molecular bioSystems.

[43]  P. Pevzner,et al.  Spectral Dictionaries , 2009, Molecular & Cellular Proteomics.

[44]  Oliver Fiehn,et al.  The volatile compound BinBase mass spectral database , 2011, BMC Bioinformatics.

[45]  M. Färkkilä,et al.  Oxidative stress has a role in malignant transformation in Barrett's oesophagus , 2002, International journal of cancer.

[46]  Erik Schultes,et al.  The FAIR Guiding Principles for scientific data management and stewardship , 2016, Scientific Data.

[47]  Apostolos Pappas,et al.  Epidermal surface lipids , 2009, Dermato-endocrinology.

[48]  Yuri A. Mirokhin,et al.  Creating a Mass Spectral Reference Library for Oligosaccharides in Human Milk. , 2018, Analytical chemistry.

[49]  Royston Goodacre,et al.  Exhaled breath analysis: a review of ‘breath-taking’ methods for off-line analysis , 2017, Metabolomics.

[50]  Xiuxia Du,et al.  Spectral Deconvolution for Gas Chromatography Mass Spectrometry-Based Metabolomics: Current Status and Future Perspectives , 2013, Computational and structural biotechnology journal.

[51]  Kamila Schmidt,et al.  Current Challenges in Volatile Organic Compounds Analysis as Potential Biomarkers of Cancer , 2015, Journal of biomarkers.

[52]  R. Knight,et al.  Global chemical analysis of biology by mass spectrometry , 2017 .

[53]  Simon Rogers,et al.  Feature-Based Molecular Networking in the GNPS Analysis Environment , 2019, Nature Methods.

[54]  Bernhard Witkop,et al.  GEPHYROTOXINS, HISTRIONICOTOXINS AND PUMILIOTOXINS FROM THE NEOTROPICAL FROG DENDROBATES HISTRIONICUS , 1977 .

[55]  Christoph Steinbeck,et al.  MetaboLights—an open-access general-purpose repository for metabolomics studies and associated meta-data , 2012, Nucleic Acids Res..

[56]  Wei Jia,et al.  ADAP-GC 4.0: Application of Clustering-Assisted Multivariate Curve Resolution to Spectral Deconvolution of Gas Chromatography-Mass Spectrometry Metabolomics Data. , 2019, Analytical chemistry.

[57]  L. Trizio,et al.  Exhaled volatile organic compounds identify patients with colorectal cancer , 2013, The British journal of surgery.

[58]  Zhentian Lei,et al.  MetExpert: An expert system to enhance gas chromatography‒mass spectrometry-based metabolite identifications. , 2018, Analytica chimica acta.

[59]  William A. Walters,et al.  QIIME allows analysis of high-throughput community sequencing data , 2010, Nature Methods.

[60]  A. Harvey Millar,et al.  The MetabolomeExpress Project: enabling web-based processing, analysis and transparent dissemination of GC/MS metabolomics datasets , 2010, BMC Bioinformatics.

[61]  Evan Bolton,et al.  ClassyFire: automated chemical classification with a comprehensive, computable taxonomy , 2016, Journal of Cheminformatics.

[62]  Hossam Haick,et al.  Detection of precancerous gastric lesions and gastric cancer through exhaled breath , 2015, Gut.

[63]  Steffen Neumann,et al.  Highly sensitive feature detection for high resolution LC/MS , 2008, BMC Bioinformatics.

[64]  Eoin Fahy,et al.  Metabolomics Workbench: An international repository for metabolomics data and metadata, metabolite standards, protocols, tutorials and training, and analysis tools , 2015, Nucleic Acids Res..

[65]  Philip Wenig,et al.  OpenChrom: a cross-platform open source software for the mass spectrometric analysis of chromatographic data , 2010, BMC Bioinformatics.

[66]  Fred W. McLafferty,et al.  Probability-based-matching algorithm with forward searching capabilities for matching unknown mass spectra of mixtures , 1985 .

[67]  Pat Monaghan,et al.  Telomere length in early life predicts lifespan , 2012, Proceedings of the National Academy of Sciences.

[68]  Andrea Soltoggio,et al.  VOCCluster: Untargeted Metabolomics Feature Clustering Approach for Clinical Breath Gas Chromatography - Mass Spectrometry Data. , 2019, Analytical chemistry.

[69]  G. Siuzdak,et al.  XCMS Online: a web-based platform to process untargeted metabolomic data. , 2012, Analytical chemistry.

[70]  W. Miekisch,et al.  Breath gas aldehydes as biomarkers of lung cancer , 2009, International journal of cancer.

[71]  C. Gahan,et al.  Current status and future perspectives , 2011 .

[72]  Andrea Soltoggio,et al.  Untargeted Metabolomics Feature Clustering Approach for Clinical Breath Gas Chromatography-Mass Spectrometry Data , 2019 .

[73]  Wolfram Weckwerth,et al.  The handbook of plant metabolomics , 2013 .

[74]  Nuno Bandeira,et al.  Mass spectral molecular networking of living microbial colonies , 2012, Proceedings of the National Academy of Sciences.

[75]  Konstantin Nikolic,et al.  Assessment of a Noninvasive Exhaled Breath Test for the Diagnosis of Oesophagogastric Cancer , 2018, JAMA oncology.

[76]  Marta Díaz,et al.  eRah: A Computational Tool Integrating Spectral Deconvolution and Alignment with Quantification and Identification of Metabolites in GC/MS-Based Metabolomics. , 2016, Analytical chemistry.

[77]  Tetsuya Sakurai,et al.  PRIMe: A Web Site That Assembles Tools for Metabolomics and Transcriptomics , 2008, Silico Biol..

[78]  Alexander A. Aksenov,et al.  MASST: A Web-based Basic Mass Spectrometry Search Tool for Molecules to Search Public Data , 2019, bioRxiv.

[79]  David S. Wishart,et al.  Using MetaboAnalyst 4.0 for Comprehensive and Integrative Metabolomics Data Analysis , 2019, Current protocols in bioinformatics.

[80]  Ngoc Hung Nguyen,et al.  Repository-scale Co- and Re-analysis of Tandem Mass Spectrometry Data , 2019, bioRxiv.

[81]  Tao Jiang,et al.  Efficient and robust feature extraction by maximum margin criterion , 2003, IEEE Transactions on Neural Networks.

[82]  Francesco Asnicar,et al.  Author Correction: Reproducible, interactive, scalable and extensible microbiome data science using QIIME 2 , 2019, Nature Biotechnology.

[83]  T. Risby,et al.  Clinical application of breath biomarkers of oxidative stress status. , 1999, Free radical biology & medicine.