A detailed comparison of analysis processes for MCC-IMS data in disease classification—Automated methods can replace manual peak annotations

Motivation Disease classification from molecular measurements typically requires an analysis pipeline from raw noisy measurements to final classification results. Multi capillary column—ion mobility spectrometry (MCC-IMS) is a promising technology for the detection of volatile organic compounds in the air of exhaled breath. From raw measurements, the peak regions representing the compounds have to be identified, quantified, and clustered across different experiments. Currently, several steps of this analysis process require manual intervention of human experts. Our goal is to identify a fully automatic pipeline that yields competitive disease classification results compared to an established but subjective and tedious semi-manual process. Method We combine a large number of modern methods for peak detection, peak clustering, and multivariate classification into analysis pipelines for raw MCC-IMS data. We evaluate all combinations on three different real datasets in an unbiased cross-validation setting. We determine which specific algorithmic combinations lead to high AUC values in disease classifications across the different medical application scenarios. Results The best fully automated analysis process achieves even better classification results than the established manual process. The best algorithms for the three analysis steps are (i) SGLTR (Savitzky-Golay Laplace-operator filter thresholding regions) and LM (Local Maxima) for automated peak identification, (ii) EM clustering (Expectation Maximization) and DBSCAN (Density-Based Spatial Clustering of Applications with Noise) for the clustering step and (iii) RF (Random Forest) for multivariate classification. Thus, automated methods can replace the manual steps in the analysis process to enable an unbiased high throughput use of the technology.

[1]  Sven Rahmann,et al.  Differentiation of chronic obstructive pulmonary disease (COPD) including lung cancer from healthy control group by breath analysis using ion mobility spectrometry , 2010 .

[2]  A. Savitzky,et al.  Smoothing and Differentiation of Data by Simplified Least Squares Procedures. , 1964 .

[3]  Sven Rahmann,et al.  Exact and heuristic algorithms for weighted cluster editing. , 2007, Computational systems bioinformatics. Computational Systems Bioinformatics Conference.

[4]  Alexey Egorov,et al.  Ressourcenbeschränkte Analyse von Ionenmobilitätsspektren mit dem Raspberry Pi , 2014 .

[5]  W. Cao,et al.  Breath analysis: potential for clinical diagnosis and exposure assessment. , 2006, Clinical chemistry.

[6]  Jan Baumbach,et al.  Peak Detection Method Evaluation for Ion Mobility Spectrometry by Using Machine Learning Approaches , 2013, Metabolites.

[7]  Sebastian Böcker,et al.  Exact Algorithms for Cluster Editing: Evaluation and Experiments , 2008, Algorithmica.

[8]  Jörg Ingo Baumbach,et al.  Visualisation of MCC/IMS-data , 2008 .

[9]  S. Bader,et al.  PROCESSING ION MOBILITY SPECTROMETRY DATA TO CHARACTERIZE GROUP DIFFERENCES IN A MULTIPLE CLASS COMPARISON , 2005 .

[10]  Sven Rahmann,et al.  A modular computational framework for automated peak extraction from ion mobility spectra , 2014, BMC Bioinformatics.

[11]  Jörg Ingo Baumbach,et al.  Peak finding and referencing in MCC/IMS-data , 2008 .

[12]  Jörg Ingo Baumbach,et al.  Detection of infectious agents in the airways by ion mobility spectrometry of exhaled breath , 2011 .

[13]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[14]  Jörg Ingo Baumbach,et al.  Preprocessing of ion mobility spectra by lognormal detailing and wavelet transform , 2008 .

[15]  Sven Rahmann,et al.  Peak modeling for Ion mobility spectrometry measurements , 2012, 2012 Proceedings of the 20th European Signal Processing Conference (EUSIPCO).

[16]  S. Kreuer,et al.  Ion mobility spectrometry in breath research , 2014, Journal of breath research.

[17]  Sven Rahmann,et al.  From raw ion mobility measurements to disease classification: a comparison of analysis processes , 2015 .

[18]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[19]  Sven Rahmann,et al.  An online peak extraction algorithm for ion mobility spectrometry data , 2015, Algorithms for Molecular Biology.