Machine learning methods on exhaled volatile organic compounds for distinguishing COPD patients from healthy controls

Exhaled volatile organic compounds (VOCs) have shown promise in diagnosing chronic obstructive pulmonary disease (COPD) but studies have been limited by small sample size and potential confounders. An investigation was conducted in order to establish whether combinations of VOCs could identify COPD patients from age and BMI matched controls. Breath samples were collected from 119 stable COPD patients and 63 healthy controls. The samples were collected with a portable apparatus, and then assayed by gas chromatography and mass spectroscopy. Machine learning approaches were applied to the data and the automatically generated models were assessed using classification accuracy and receiver operating characteristic (ROC) curves. Cross-validation of the combinations correctly predicted the diagnosis in 79% of COPD patients and 64% of controls and an optimum area under the ROC curve of 0.82 was obtained. Comparison of current and ex smokers within the COPD group showed that smoking status was likely to affect the classification; with correct prediction of smoking status in 85% of COPD subjects. When current smokers were omitted from the analysis, prediction of COPD was similar at 78% but correct prediction of controls was increased to 74%. Applying different analytical methods to the largest group of subjects so far, suggests VOC analysis holds promise for diagnosing COPD but smoking status needs to be balanced.

[1]  Donald W. Loveland,et al.  A machine program for theorem-proving , 2011, CACM.

[2]  Lotfi A. Zadeh,et al.  Fuzzy Sets , 1996, Inf. Control..

[3]  A. B. Robinson,et al.  Quantitative analysis of urine vapor and breath by gas-liquid partition chromatography. , 1971, Proceedings of the National Academy of Sciences of the United States of America.

[4]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[5]  J. Shao Linear Model Selection by Cross-validation , 1993 .

[6]  William W. Cohen Fast Effective Rule Induction , 1995, ICML.

[7]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[8]  M. Phillips Method for the collection and assay of volatile organic compounds in breath. , 1997, Analytical biochemistry.

[9]  Cyril Goutte,et al.  Note on Free Lunches and Cross-Validation , 1997, Neural Computation.

[10]  Ian H. Witten,et al.  Generating Accurate Rule Sets Without Global Optimization , 1998, ICML.

[11]  Tin Kam Ho,et al.  The Random Subspace Method for Constructing Decision Forests , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[12]  R. Cataneo,et al.  Volatile organic compounds in breath as markers of lung cancer: a cross-sectional study , 1999, The Lancet.

[13]  M. Phillips,et al.  Effect of age on the breath methylated alkane contour, a display of apparent new markers of oxidative stress. , 2000, The Journal of laboratory and clinical medicine.

[14]  R. Fall,et al.  Human breath isoprene and its relation to blood cholesterol levels: new measurements and modeling. , 2001, Journal of applied physiology.

[15]  Kevin Gleeson,et al.  Detection of lung cancer with volatile markers in the breath. , 2003, Chest.

[16]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[17]  S. Telser,et al.  Applications of breath gas analysis in medicine , 2004 .

[18]  M. Phillips,et al.  Heart allograft rejection: detection with breath alkanes in low levels (the HARDBALL study). , 2004, The Journal of heart and lung transplantation : the official publication of the International Society for Heart Transplantation.

[19]  M. Phillips,et al.  Increased breath biomarkers of oxidative stress in diabetes mellitus. , 2004, Clinica chimica acta; international journal of clinical chemistry.

[20]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[21]  Robert West,et al.  Outcome criteria in smoking cessation trials: proposal for a common standard. , 2005, Addiction.

[22]  L. Bianchi,et al.  Exhaled volatile organic compounds in patients with non-small cell lung cancer: cross sectional and nested short-term follow-up study , 2005, Respiratory research.

[23]  J. Hankinson,et al.  Interpretative strategies for lung function tests , 2005, European Respiratory Journal.

[24]  David W. Aha,et al.  Instance-Based Learning Algorithms , 1991, Machine Learning.

[25]  D. Klemp,et al.  Volatile organic compounds in the exhaled breath of young patients with cystic fibrosis , 2006, European Respiratory Journal.

[26]  Marc Quirynen,et al.  Differences between alveolar air and mouth air. , 2007, Analytical chemistry.

[27]  S M Rappaport,et al.  Exposure reconstruction for reducing uncertainty in risk assessment: example using MTBE biomarkers and a simple pharmacokinetic model , 2007, Biomarkers : biochemical indicators of exposure, response, and susceptibility to chemicals.

[28]  Bogusław Buszewski,et al.  Human exhaled air analytics: biomarkers of diseases. , 2007, Biomedical chromatography : BMC.

[29]  Olaf Tietje,et al.  Volatile biomarkers of pulmonary tuberculosis in the breath. , 2007, Tuberculosis.

[30]  Tarek Mekhail,et al.  Diagnosis of lung cancer by the analysis of exhaled breath with a colorimetric sensor array , 2007, Thorax.

[31]  E. Wouters,et al.  Development of accurate classification method based on the analysis of volatile organic compounds from human exhaled air. , 2008, Journal of chromatography. B, Analytical technologies in the biomedical and life sciences.

[32]  Chris Cornelis,et al.  A New Approach to Fuzzy-Rough Nearest Neighbour Classification , 2008, RSCTC.

[33]  D. Price,et al.  Screening for and early detection of chronic obstructive pulmonary disease , 2009, The Lancet.

[34]  M. Fiegl,et al.  Noninvasive detection of lung cancer by analysis of exhaled breath , 2009, BMC Cancer.

[35]  L. Freitag,et al.  Ion mobility spectrometry for the detection of volatile organic compounds in exhaled breath of patients with lung cancer: results of a pilot study , 2009, Thorax.

[36]  Niki Fens,et al.  Exhaled breath profiling enables discrimination of chronic obstructive pulmonary disease and asthma. , 2009, American journal of respiratory and critical care medicine.

[37]  Qiang Shen,et al.  New Approaches to Fuzzy-Rough Feature Selection , 2009, IEEE Transactions on Fuzzy Systems.

[38]  L. Trizio,et al.  Chemical characterization of exhaled breath to differentiate between patients with malignant plueral mesothelioma from subjects with similar professional asbestos exposure , 2010, Analytical and bioanalytical chemistry.

[39]  W. Miekisch,et al.  Breath biomarkers for lung cancer detection and assessment of smoking related effects--confounding variables, influence of normalization and statistical algorithms. , 2010, Clinica chimica acta; international journal of clinical chemistry.

[40]  J W Dallinga,et al.  A profile of volatile organic compounds in breath discriminates COPD patients from controls. , 2009, Respiratory medicine.

[41]  U. Beckmann,et al.  Construction and Evaluation of a Versatile ${\hbox {CO}}_{2}$ Controlled Breath Collection Device , 2010, IEEE Sensors Journal.

[42]  Chris Cornelis,et al.  Fuzzy-Rough Nearest Neighbour Classification , 2011, Trans. Rough Sets.

[43]  Ashley Woodcock,et al.  Non-invasive phenotyping using exhaled volatile organic compounds in asthma , 2011, Thorax.

[44]  F. Martinez,et al.  Global strategy for the diagnosis, management, and prevention of chronic obstructive pulmonary disease: GOLD executive summary. , 2007, American journal of respiratory and critical care medicine.