VOCCluster: Untargeted Metabolomics Feature Clustering Approach for Clinical Breath Gas Chromatography - Mass Spectrometry Data.

Metabolic profiling of breath analysis involves processing, alignment, scaling and clustering of thousands of features ex-tracted from Gas Chromatography Mass spectrometry (GC-MS) data from hundreds of participants. The multi-step data processing is complicated, operator error-prone and time-consuming. Automated algorithmic clustering methods that are able to cluster features in a fast and reliable way are necessary. These accelerate metabolic profiling and discovery plat-forms for next generation medical diagnostic tools. Our unsupervised clustering technique, VOCCluster, prototyped in Py-thon, handles features of deconvolved GC-MS breath data. VOCCluster was created from a heuristic ontology based on the observation of experts undertaking data processing with a suite of software packages. VOCCluster identifies and clusters groups of volatile organic compounds (VOCs) from deconvolved GC-MS breath with similar mass spectra and retention index profiles. VOCCluster was used to cluster more than 15,000 features extracted from 74 GC-MS clinical breath samples obtained from participants with cancer before and after a radiation therapy. Results were evaluated against a panel of ground truth compounds and compared to other clustering methods (DBSCAN and OPTICS) that were used in previous metabolomics studies. VOCCluster was able to cluster those features into 1081 groups (including endogenous, exogenous compounds and instrumental artefacts) with an accuracy rate of 96% (± 0.04 at 95% confidence interval).

[1]  Tobias Depke,et al.  Clustering of MS2 spectra using unsupervised methods to aid the identification of secondary metabolites from Pseudomonas aeruginosa. , 2017, Journal of chromatography. B, Analytical technologies in the biomedical and life sciences.

[2]  P. Hopke,et al.  Comparison of two cluster analysis methods using single particle mass spectra , 2008 .

[3]  Ulrike Tisch,et al.  Classification of breast cancer precursors through exhaled breath , 2011, Breast Cancer Research and Treatment.

[4]  D. Scott,et al.  Optimization and testing of mass spectral library search algorithms for compound identification , 1994, Journal of the American Society for Mass Spectrometry.

[5]  Célia Lourenço,et al.  Breath Analysis in Disease Diagnosis: Methodological Considerations and Applications , 2014, Metabolites.

[6]  H. Haick,et al.  Detection of lung, breast, colorectal, and prostate cancers from exhaled breath using a single array of nanosensors , 2010, British Journal of Cancer.

[7]  J. Watson,et al.  Introduction to mass spectrometry , 1985 .

[8]  Douglas B. Kell,et al.  A metabolome pipeline: from concept to data to knowledge , 2005, Metabolomics.

[9]  E. Wouters,et al.  Development of accurate classification method based on the analysis of volatile organic compounds from human exhaled air. , 2008, Journal of chromatography. B, Analytical technologies in the biomedical and life sciences.

[10]  Hossam Haick,et al.  Breath testing as potential colorectal cancer screening tool , 2016, International journal of cancer.

[11]  Albert Sickmann,et al.  Comparison and Evaluation of Clustering Algorithms for Tandem Mass Spectra. , 2017, Journal of proteome research.

[12]  Hans-Peter Kriegel,et al.  Density‐based clustering , 2011, WIREs Data Mining Knowl. Discov..

[13]  Malcolm J. McConville,et al.  Progressive peak clustering in GC-MS Metabolomic experiments applied to Leishmania parasites , 2006, Bioinform..

[14]  Taiwo Oladipupo Ayodele,et al.  Types of Machine Learning Algorithms , 2010 .

[15]  C. L. Paul Thomas,et al.  How long may a breath sample be stored for at  −80 °C? A study of the stability of volatile organic compounds trapped onto a mixed Tenax:Carbograph trap adsorbent bed from exhaled breath , 2016, Journal of breath research.

[16]  Tobias Frisch,et al.  Carotta: Revealing Hidden Confounder Markers in Metabolic Breath Profiles , 2015, Metabolites.

[17]  Emily L. Kang,et al.  Computational and statistical analysis of metabolomics data , 2015, Metabolomics.

[18]  Brian Carrillo,et al.  Methods for peptide identification by spectral comparison , 2007, Proteome Science.

[19]  M. Quirynen,et al.  GC-MS analysis of breath odor compounds in liver patients. , 2008, Journal of chromatography. B, Analytical technologies in the biomedical and life sciences.

[20]  Jan Baumbach,et al.  Comparing the performance of biomedical clustering methods , 2015, Nature Methods.

[21]  O. Fiehn,et al.  Identification of uncommon plant metabolites based on calculation of elemental compositions using gas chromatography and quadrupole mass spectrometry. , 2000, Analytical chemistry.

[22]  I. Wilson,et al.  A workflow for the metabolomic/metabonomic investigation of exhaled breath using thermal desorption GC-MS. , 2012, Bioanalysis.

[23]  B. Costello,et al.  The human volatilome: volatile organic compounds (VOCs) in exhaled breath, skin emanations, urine, feces and saliva , 2014, Journal of breath research.

[24]  C. Junot,et al.  High resolution mass spectrometry for structural identification of metabolites in metabolomics , 2015, Metabolomics.