Untargeted Metabolomics Feature Clustering Approach for Clinical Breath Gas Chromatography-Mass Spectrometry Data

Motivation: Metabolic profiling of breath analysis involves processing, alignment, scaling and clustering of thousands of features extracted from Gas Chromatography Mass spectrometry (GC-MS) data from hundreds of participants. The multi-step data processing is complicated, operator error-prone and time-consuming. Automated algorithmic clustering methods that are able to cluster features in a fast and reliable way are necessary. These accelerate metabolic profiling and discovery platforms for next generation medical diagnostic tools. Results: Our unsupervised clustering technique, VOCCluster, prototyped in Python, handles features of deconvolved GC-MS breath data. VOCCluster was created from a heuristic ontology based on the observation of experts undertaking data processing with a suite of software packages. VOCCluster identifies and clusters groups of volatile organic compounds (VOCs) from deconvolved GC-MS breath with similar mass spectra and retention index profiles. VOCCluster was used to cluster more than 15,000 features extracted from 74 GC-MS clinical breath samples obtained from participants with cancer before and after a radiation therapy. VOCCluster was able to cluster those features into 1081 groups (including endogenous, exogenous compounds and instrumental artifacts) with an accuracy rate of 96% (± 0.04 at 95% confidence interval). Results were evaluated against a panel of ground truth compounds, and compared to other clustering methods used in previous metabolomics studies such as DBSCAN and OPTICS. Availability: The source code and the data used in this paper are available for download at https://github.com/Yaser218/Untargeted-Metabolomics-Clustering. Contact: D.Salman@lboro.ac.uk

[1]  E. Wouters,et al.  Development of accurate classification method based on the analysis of volatile organic compounds from human exhaled air. , 2008, Journal of chromatography. B, Analytical technologies in the biomedical and life sciences.

[2]  O. Fiehn,et al.  Identification of uncommon plant metabolites based on calculation of elemental compositions using gas chromatography and quadrupole mass spectrometry. , 2000, Analytical chemistry.

[3]  Albert Sickmann,et al.  Comparison and Evaluation of Clustering Algorithms for Tandem Mass Spectra. , 2017, Journal of proteome research.

[4]  J. Steehler Introduction to Mass Spectrometry:: Instrumentation, Applications, and Strategies for Data Interpretation , 2009 .

[5]  Hans-Peter Kriegel,et al.  Density‐based clustering , 2011, WIREs Data Mining Knowl. Discov..

[6]  Tobias Depke,et al.  Clustering of MS2 spectra using unsupervised methods to aid the identification of secondary metabolites from Pseudomonas aeruginosa. , 2017, Journal of chromatography. B, Analytical technologies in the biomedical and life sciences.

[7]  D. Scott,et al.  Optimization and testing of mass spectral library search algorithms for compound identification , 1994, Journal of the American Society for Mass Spectrometry.

[8]  C. L. Paul Thomas,et al.  How long may a breath sample be stored for at  −80 °C? A study of the stability of volatile organic compounds trapped onto a mixed Tenax:Carbograph trap adsorbent bed from exhaled breath , 2016, Journal of breath research.

[9]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[10]  Anton Amann,et al.  Volatile Biomarkers : Non-Invasive Diagnosis in Physiology and Medicine , 2013 .

[11]  Jan Baumbach,et al.  Comparing the performance of biomedical clustering methods , 2015, Nature Methods.

[12]  I. Wilson,et al.  A workflow for the metabolomic/metabonomic investigation of exhaled breath using thermal desorption GC-MS. , 2012, Bioanalysis.

[13]  David S. Wishart,et al.  MetaboAnalyst 3.0—making metabolomics more meaningful , 2015, Nucleic Acids Res..

[14]  Douglas B. Kell,et al.  A metabolome pipeline: from concept to data to knowledge , 2005, Metabolomics.

[15]  Brian Carrillo,et al.  Methods for peptide identification by spectral comparison , 2007, Proteome Science.

[16]  Malcolm J. McConville,et al.  Progressive peak clustering in GC-MS Metabolomic experiments applied to Leishmania parasites , 2006, Bioinform..

[17]  Yann Guitton,et al.  MSeasy: unsupervised and untargeted GC-MS data processing , 2012, Bioinform..

[18]  R. Abagyan,et al.  XCMS: processing mass spectrometry data for metabolite profiling using nonlinear peak alignment, matching, and identification. , 2006, Analytical chemistry.

[19]  C. Junot,et al.  High resolution mass spectrometry for structural identification of metabolites in metabolomics , 2015, Metabolomics.

[20]  Tobias Frisch,et al.  Carotta: Revealing Hidden Confounder Markers in Metabolic Breath Profiles , 2015, Metabolites.

[21]  Y. M. Tikunov,et al.  MSClust: a tool for unsupervised mass spectra extraction of chromatography-mass spectrometry ion-wise aligned data , 2011, Metabolomics.

[22]  I. Vidavsky,et al.  Comparing similar spectra: From similarity index to spectral contrast angle , 2002, Journal of the American Society for Mass Spectrometry.

[23]  Célia Lourenço,et al.  Breath Analysis in Disease Diagnosis: Methodological Considerations and Applications , 2014, Metabolites.