Fast and automated biomarker detection in breath samples with machine learning

Volatile organic compounds (VOCs) in human breath can reveal a large spectrum of health conditions and can be used for fast, accurate and non-invasive diagnostics. Gas chromatography-mass spectrometry (GC-MS) is used to measure VOCs, but its application is limited by expert-driven data analysis that is time-consuming, subjective and may introduce errors. We propose a system to perform GC-MS data analysis that exploits deep learning pattern recognition ability to learn and automatically detect VOCs directly from raw data, thus bypassing expert-led processing. The new proposed approach showed to outperform the expert-led analysis by detecting a significantly higher number of VOCs in just a fraction of time while maintaining high specificity. These results suggest that the proposed method can help the large-scale deployment of breath-based diagnosis by reducing time and cost, and increasing accuracy and consistency.

[1]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[2]  Zenghui Wang,et al.  Deep Convolutional Neural Networks for Image Classification: A Comprehensive Review , 2017, Neural Computation.

[3]  Heesun Chung,et al.  Urine Multi-drug Screening with GC-MS or LC-MS-MS Using SALLE-hybrid PPT/SPE. , 2018, Journal of analytical toxicology.

[4]  A. Asten,et al.  The importance of GC and GC-MS in perfume analysis , 2002 .

[5]  M. Phillips,et al.  Volatile biomarkers in the breath of women with breast cancer , 2010, Journal of breath research.

[6]  B N Colby,et al.  Spectral deconvolution for overlapping GC/MS components , 1992, Journal of the American Society for Mass Spectrometry.

[7]  J W Dallinga,et al.  A profile of volatile organic compounds in breath discriminates COPD patients from controls. , 2009, Respiratory medicine.

[8]  V. Krasnopolsky,et al.  Chemical composition of the atmosphere of Venus , 1981, Nature.

[9]  J. Watson,et al.  Introduction to Mass Spectrometry: Instrumentation, Applications, and Strategies for Data Interpretation , 2007 .

[10]  Royston Goodacre,et al.  Metabolomics: Current technologies and future trends , 2006, Proteomics.

[11]  Maria Aparecida Azevedo Pereira da Silva,et al.  Assessment of aroma impact compounds in a cashew apple-based alcoholic beverage by GC-MS and GC-olfactometry , 2006 .

[12]  Ronald Davis,et al.  Neural networks and deep learning , 2017 .

[13]  Xiao-Li Meng,et al.  The Art of Data Augmentation , 2001 .

[14]  L. Trizio,et al.  Exhaled volatile organic compounds identify patients with colorectal cancer , 2013, The British journal of surgery.

[15]  Royston Goodacre,et al.  Taking your breath away: metabolomics breathes life in to personalized medicine. , 2014, Trends in biotechnology.

[16]  Emily L. Kang,et al.  Computational and statistical analysis of metabolomics data , 2015, Metabolomics.

[17]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  M. Stamou,et al.  An overview of the doping control analysis during the Olympic Games of 2004 in Athens, Greece , 2006 .

[19]  Christoph Steinbeck,et al.  The role of reporting standards for metabolite annotation and identification in metabolomic studies , 2013, GigaScience.

[20]  Andrea Soltoggio,et al.  VOCCluster: Untargeted Metabolomics Feature Clustering Approach for Clinical Breath Gas Chromatography - Mass Spectrometry Data. , 2019, Analytical chemistry.

[21]  W. Miekisch,et al.  Breath gas aldehydes as biomarkers of lung cancer , 2009, International journal of cancer.

[22]  Guoqiang Peter Zhang,et al.  Neural networks for classification: a survey , 2000, IEEE Trans. Syst. Man Cybern. Part C.

[23]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[24]  P. L. Wineman,et al.  Detection of petroleum-based accelerants in fire debris by target compound gas chromatography/mass spectrometry , 1991 .

[25]  Alex Zhavoronkov,et al.  Applications of Deep Learning in Biomedicine. , 2016, Molecular pharmaceutics.

[26]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[27]  Xiang Zhang,et al.  OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks , 2013, ICLR.

[28]  Yixiang Duan,et al.  Exhaled isopropanol: new potential biomarker in diabetic breathomics and its metabolic correlations with acetone , 2017 .

[29]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Abhinav Gupta,et al.  Training Region-Based Object Detectors with Online Hard Example Mining , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  S. Stein An integrated method for spectrum extraction and compound identification from gas chromatography/mass spectrometry data , 1999 .

[32]  A Smolinska,et al.  Current breathomics—a review on data pre-processing techniques and machine learning in metabolomics breath analysis , 2014, Journal of breath research.

[33]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[34]  Coral Barbas,et al.  Gas chromatography-mass spectrometry (GC-MS)-based metabolomics. , 2011, Methods in molecular biology.

[35]  Agnieszka Smolinska,et al.  Profile of volatile organic compounds in exhaled breath changes as a result of gluten-free diet , 2013, Journal of breath research.

[36]  M. Careri,et al.  Fish and food safety: Determination of formaldehyde in 12 fish species by SPME extraction and GC–MS analysis , 2007 .

[37]  Paul Sajda,et al.  Machine learning for detection and diagnosis of disease. , 2006, Annual review of biomedical engineering.

[38]  Jürgen Schmidhuber,et al.  Multi-column deep neural networks for image classification , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[39]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[40]  Vladimir A. Likic,et al.  Extraction of pure components from overlapped signals in gas chromatography-mass spectrometry (GC-MS) , 2009, BioData Mining.

[41]  Y. LeCun,et al.  Learning methods for generic object recognition with invariance to pose and lighting , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[42]  Jeffrey S. Morris,et al.  Pre-Processing Mass Spectrometry Data , 2007 .

[43]  Lawrence D. Jackel,et al.  Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.

[44]  I. Zenkevich Kovats’ Retention Index System , 2009 .

[45]  Yang Hu,et al.  Convolutional neural networks for automated targeted analysis of raw gas chromatography-mass spectrometry data , 2018, 2018 International Joint Conference on Neural Networks (IJCNN).