Using the Expectation Maximization Algorithm with Heterogeneous Mixture Components for the Analysis of Spectrometry Data

Coupling a multi-capillary column (MCC) with an ion mobility (IM) spectrometer (IMS) opened a multitude of new application areas for gas analysis, especially in a medical context, as volatile organic compounds (VOCs) in exhaled breath can hint at a person's state of health. To obtain a potential diagnosis from a raw MCC/IMS measurement, several computational steps are necessary, which so far have required manual interaction, e.g., human evaluation of discovered peaks. We have recently proposed an automated pipeline for this task that does not require human intervention during the analysis. Nevertheless, there is a need for improved methods for each computational step. In comparison to gas chromatography / mass spectrometry (GC/MS) data, MCC/IMS data is easier and less expensive to obtain, but peaks are more diffuse and there is a higher noise level. MCC/IMS measurements can be described as samples of mixture models (i.e., of convex combinations) of two-dimensional probability distributions. So we use the expectation-maximization (EM) algorithm to deconvolute mixtures in order to develop methods that improve data processing in three computational steps: denoising, baseline correction and peak clustering. A common theme of these methods is that mixture components within one model are not homogeneous (e.g., all Gaussian), but of different types. Evaluation shows that the novel methods outperform the existing ones. We provide Python software implementing all three methods and make our evaluation data available at this http URL

[1]  Alexander Bunkowski,et al.  MCC-IMS data analysis using automated spectra processing and explorative visualisation methods , 2012 .

[2]  Sergei Vassilvitskii,et al.  k-means++: the advantages of careful seeding , 2007, SODA '07.

[3]  Ari Rappoport,et al.  The NVI Clustering Evaluation Measure , 2009, CoNLL.

[4]  Sven Rahmann,et al.  Differentiation of chronic obstructive pulmonary disease (COPD) including lung cancer from healthy control group by breath analysis using ion mobility spectrometry , 2010 .

[5]  李幼升,et al.  Ph , 1989 .

[6]  Jörg Ingo Baumbach,et al.  Peak finding and referencing in MCC/IMS-data , 2008 .

[7]  Jörg Ingo Baumbach,et al.  Preprocessing of ion mobility spectra by lognormal detailing and wavelet transform , 2008 .

[8]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[9]  Sven Rahmann,et al.  Peak modeling for Ion mobility spectrometry measurements , 2012, 2012 Proceedings of the 20th European Signal Processing Conference (EUSIPCO).

[10]  Jörg Ingo Baumbach,et al.  Peak comparison in MCC/IMS-data—searching for potential biomarkers in human breath data , 2008 .

[11]  L. Freitag,et al.  Ion mobility spectrometry for the detection of volatile organic compounds in exhaled breath of patients with lung cancer: results of a pilot study , 2009, Thorax.

[12]  Jan Baumbach,et al.  Peak Detection Method Evaluation for Ion Mobility Spectrometry by Using Machine Learning Approaches , 2013, Metabolites.

[13]  Sven Rahmann,et al.  A modular computational framework for automated peak extraction from ion mobility spectra , 2014, BMC Bioinformatics.

[14]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[15]  Z. Karpas,et al.  Ion mobility spectrometry , 1993, Breathborne Biomarkers and the Human Volatilome.

[16]  C. Mallows,et al.  A Method for Comparing Two Hierarchical Clusterings , 1983 .

[17]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[18]  Jörg Ingo Baumbach,et al.  Reduction of ion mobility spectrometry data by clustering characteristic peak structures , 2006 .

[19]  Jörg Ingo Baumbach,et al.  Ion mobility spectrometry for the detection of volatile organic compounds in exhaled breath of lung cancer patients - Results of a pilot study. , 2009 .

[20]  J I Baumbach,et al.  MCC/IMS signals in human breath related to sarcoidosis—results of a feasibility study using an automated peak finding procedure , 2009, Journal of breath research.