Peak Detection Method Evaluation for Ion Mobility Spectrometry by Using Machine Learning Approaches

Ion mobility spectrometry with pre-separation by multi-capillary columns (MCC/IMS) has become an established inexpensive, non-invasive bioanalytics technology for detecting volatile organic compounds (VOCs) with various metabolomics applications in medical research. To pave the way for this technology towards daily usage in medical practice, different steps still have to be taken. With respect to modern biomarker research, one of the most important tasks is the automatic classification of patient-specific data sets into different groups, healthy or not, for instance. Although sophisticated machine learning methods exist, an inevitable preprocessing step is reliable and robust peak detection without manual intervention. In this work we evaluate four state-of-the-art approaches for automated IMS-based peak detection: local maxima search, watershed transformation with IPHEx, region-merging with VisualNow, and peak model estimation (PME). We manually generated a gold standard with the aid of a domain expert (manual) and compare the performance of the four peak calling methods with respect to two distinct criteria. We first utilize established machine learning methods and systematically study their classification performance based on the four peak detectors’ results. Second, we investigate the classification variance and robustness regarding perturbation and overfitting. Our main finding is that the power of the classification accuracy is almost equally good for all methods, the manually created gold standard as well as the four automatic peak finding methods. In addition, we note that all tools, manual and automatic, are similarly robust against perturbations. However, the classification performance is more robust against overfitting when using the PME as peak calling preprocessor. In summary, we conclude that all methods, though small differences exist, are largely reliable and enable a wide spectrum of real-world biomedical applications.

[1]  J. Baumbach Ion mobility spectrometry coupled with multi-capillary columns for metabolic profiling of human breath , 2009, Journal of breath research.

[2]  Z. Karpas Ion mobility spectrometry: a personal retrospective , 2012, International Journal for Ion Mobility Spectrometry.

[3]  Gunther Becher,et al.  Detection of characteristic clusters in IMS-Spectrograms of exhaled air polluted with environmental contaminants , 2012, International Journal for Ion Mobility Spectrometry.

[4]  Anton Amann,et al.  Ion mobility spectrometry for detection of skin volatiles , 2012, Journal of chromatography. B, Analytical technologies in the biomedical and life sciences.

[5]  Jörg Ingo Baumbach,et al.  Breit-Wigner-Function and IMS-signals , 2009 .

[6]  Ashutosh Kumar Singh,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2010 .

[7]  Luc Vincent,et al.  Watersheds in Digital Spaces: An Efficient Algorithm Based on Immersion Simulations , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[8]  Richard G Brereton,et al.  Automated peak detection and matching algorithm for gas chromatography-differential mobility spectrometry. , 2011, Analytical chemistry.

[9]  S. Bader Identification and quantification of peaks in spectrometric data , 2008 .

[10]  Sven Rahmann,et al.  Peak modeling for Ion mobility spectrometry measurements , 2012, 2012 Proceedings of the 20th European Signal Processing Conference (EUSIPCO).

[11]  Jörg Ingo Baumbach,et al.  Statistical and bioinformatical methods to differentiate chronic obstructive pulmonary disease (COPD) including lung cancer from healthy control by breath analysis using ion mobility spectrometry , 2011 .

[12]  Nixon,et al.  Feature Extraction & Image Processing , 2008 .

[13]  W. Vautz,et al.  Detection of characteristic metabolites of Aspergillus fumigatus and Candida species using ion mobility spectrometry – metabolic profiling by volatile organic compounds , 2011, Mycoses.

[14]  Andy Liaw,et al.  Classification and Regression by randomForest , 2007 .

[15]  Jörg Ingo Baumbach,et al.  Reduction of ion mobility spectrometry data by clustering characteristic peak structures , 2006 .

[16]  A. Savitzky,et al.  Smoothing and Differentiation of Data by Simplified Least Squares Procedures. , 1964 .

[17]  Xavier Robin,et al.  pROC: an open-source package for R and S+ to analyze and compare ROC curves , 2011, BMC Bioinformatics.

[18]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[19]  Jörg Ingo Baumbach,et al.  Peak finding and referencing in MCC/IMS-data , 2008 .

[20]  Jörg Ingo Baumbach,et al.  Detection of human metabolites using multi-capillary columns coupled to ion mobility spectrometers. , 2005, Journal of chromatography. A.

[21]  Alexander Bunkowski,et al.  MCC-IMS data analysis using automated spectra processing and explorative visualisation methods , 2012 .

[22]  Michel Lang,et al.  Providing Information by Resource- Constrained Data Analysis , 2011 .