Data Dependent Peak Model Based Spectrum Deconvolution for Analysis of High Resolution LC-MS Data

A data dependent peak model (DDPM) based spectrum deconvolution method was developed for analysis of high resolution LC-MS data. To construct the selected ion chromatogram (XIC), a clustering method, the density based spatial clustering of applications with noise (DBSCAN), is applied to all m/z values of an LC-MS data set to group the m/z values into each XIC. The DBSCAN constructs XICs without the need for a user defined m/z variation window. After the XIC construction, the peaks of molecular ions in each XIC are detected using both the first and the second derivative tests, followed by an optimized chromatographic peak model selection method for peak deconvolution. A total of six chromatographic peak models are considered, including Gaussian, log-normal, Poisson, gamma, exponentially modified Gaussian, and hybrid of exponential and Gaussian models. The abundant nonoverlapping peaks are chosen to find the optimal peak models that are both data- and retention-time-dependent. Analysis of 18 spiked-in LC-MS data demonstrates that the proposed DDPM spectrum deconvolution method outperforms the traditional method. On average, the DDPM approach not only detected 58 more chromatographic peaks from each of the testing LC-MS data but also improved the retention time and peak area 3% and 6%, respectively.

[1]  G. Guiochon,et al.  Two-dimensional liquid chromatography/mass spectrometry/mass spectrometry separation of water-soluble metabolites. , 2010, Journal of chromatography. A.

[2]  Xinmin Yin,et al.  Metabolomic analysis of the effects of polychlorinated biphenyls in nonalcoholic fatty liver disease. , 2012, Journal of proteome research.

[3]  Jun Zhang,et al.  MetSign: a computational platform for high-resolution mass spectrometry-based metabolomics. , 2011, Analytical chemistry.

[4]  D. Massart,et al.  Looking for natural patterns in data: Part 1. Density-based approach , 2001 .

[5]  Robert L. White,et al.  Dependence of Chromatogram Peak Areas Obtained by Curve-Fitting on the Choice of Peak Shape Function , 1997 .

[6]  Matej Oresic,et al.  MZmine 2: Modular framework for processing, visualizing, and analyzing mass spectrometry-based molecular profile data , 2010, BMC Bioinformatics.

[7]  Knut Reinert,et al.  OpenMS – An open-source software framework for mass spectrometry , 2008, BMC Bioinformatics.

[8]  J. Li Development and evaluation of flexible empirical peak functions for processing chromatographic peaks. , 1997, Analytical chemistry.

[9]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[10]  J. Jorgenson,et al.  A hybrid of exponential and gaussian functions as a simple model of asymmetric chromatographic peaks. , 2001, Journal of chromatography. A.

[11]  Jiawei Han,et al.  Proceedings of the Second International Conference on Knowledge Discovery and Data Mining , 1996 .

[12]  G. Siuzdak,et al.  XCMS Online: a web-based platform to process untargeted metabolomic data. , 2012, Analytical chemistry.

[13]  J. Grimalt,et al.  An experimental study of the efficiency of different statistical functions for the resolution of chromatograms with overlapping peaks , 1987 .

[14]  Jun Zhang,et al.  A method of calculating the second dimension retention index in comprehensive two-dimensional gas chromatography time-of-flight mass spectrometry. , 2011, Journal of chromatography. A.

[15]  E. Grushka Characterization of exponentially modified Gaussian peaks in chromatography. , 1972, Analytical chemistry.

[16]  V B Di Marco,et al.  Mathematical functions for the representation of chromatographic peaks. , 2001, Journal of chromatography. A.

[17]  F. Regnier,et al.  Multi-dimensional liquid chromatography in proteomics--a review. , 2010, Analytica chimica acta.

[18]  Li Zhang,et al.  Data preprocessing method for liquid chromatography-mass spectrometry based metabolomics. , 2012, Analytical chemistry.