Incorporating peak grouping information for alignment of multiple liquid chromatography-mass spectrometry datasets

Motivation: The combination of liquid chromatography and mass spectrometry (LC/MS) has been widely used for large-scale comparative studies in systems biology, including proteomics, glycomics and metabolomics. In almost all experimental design, it is necessary to compare chromatograms across biological or technical replicates and across sample groups. Central to this is the peak alignment step, which is one of the most important but challenging preprocessing steps. Existing alignment tools do not take into account the structural dependencies between related peaks that coelute and are derived from the same metabolite or peptide. We propose a direct matching peak alignment method for LC/MS data that incorporates related peaks information (within each LC/MS run) and investigate its effect on alignment performance (across runs). The groupings of related peaks necessary for our method can be obtained from any peak clustering method and are built into a pair-wise peak similarity score function. The similarity score matrix produced is used by an approximation algorithm for the weighted matching problem to produce the actual alignment result. Results: We demonstrate that related peak information can improve alignment performance. The performance is evaluated on a set of benchmark datasets, where our method performs competitively compared to other popular alignment tools. Availability: The proposed alignment method has been implemented as a stand-alone application in Python, available for download at http://github.com/joewandy/peak-grouping-alignment. Contact: Simon.Rogers@glasgow.ac.uk Supplementary information: Supplementary data are available at Bioinformatics online.

[1]  Bin Ma,et al.  A combinatorial approach to the peptide feature matching problem for label-free quantification , 2013, Bioinform..

[2]  Rainer Breitling,et al.  Mixture model clustering for peak filtering in metabolomics , 2012 .

[3]  Harold W. Kuhn,et al.  The Hungarian method for the assignment problem , 1955, 50 Years of Integer Programming.

[4]  Jens Stoye,et al.  Combining peak- and chromatogram-based retention time alignment algorithms for multiple chromatography-mass spectrometry datasets , 2012, BMC Bioinformatics.

[5]  Ran Duan,et al.  Scaling algorithms for approximate and exact maximum weight matching , 2011, ArXiv.

[6]  R. Breitling,et al.  Toward global metabolomics analysis with hydrophilic interaction liquid chromatography-mass spectrometry: improved metabolite identification by retention time prediction. , 2011, Analytical chemistry.

[7]  Ullrich Köthe,et al.  SIMA: Simultaneous Multiple Alignment of LC/MS Peak Lists , 2011, Bioinform..

[8]  J. Carstensen,et al.  Aligning of single and multiple wavelength chromatographic profiles for chemometric data analysis using correlation optimised warping , 1998 .

[9]  L Pattini,et al.  MassUntangler: a novel alignment tool for label-free liquid chromatography-mass spectrometry proteomic data. , 2011, Journal of chromatography. A.

[10]  Rainer Breitling,et al.  MetAssign: probabilistic annotation of metabolites from LC–MS data using a Bayesian clustering approach , 2014, Bioinform..

[11]  Dan Ventura,et al.  LC-MS alignment in theory and practice: a comprehensive algorithmic review , 2013, Briefings Bioinform..

[12]  R. Breitling,et al.  Simple data-reduction method for high-resolution LC-MS data in metabolomics. , 2009, Bioanalysis.

[13]  Steffen Neumann,et al.  Critical assessment of alignment procedures for LC-MS proteomics and metabolomics measurements , 2008, BMC Bioinformatics.

[14]  L. B. Wilson,et al.  The stable marriage problem , 1971, Commun. ACM.

[15]  P. Eilers Parametric time warping. , 2004, Analytical chemistry.

[16]  Martin Dyer,et al.  The Stable Marriage Problem: Structure and Algorithms , 1991 .

[17]  Robert W. Irving,et al.  The Stable marriage problem - structure and algorithms , 1989, Foundations of computing series.

[18]  Jijie Wang,et al.  Graph-based peak alignment algorithms for multiple liquid chromatography-mass spectrometry datasets , 2013, Bioinform..

[19]  Matej Oresic,et al.  SOFTWARE Open Access , 2013 .

[20]  S. Chiba,et al.  Dynamic programming algorithm optimization for spoken word recognition , 1978 .

[21]  Kai Stühler,et al.  Retention time alignment algorithms for LC/MS data must consider non-linear shifts , 2009, Bioinform..

[22]  Jian Yang,et al.  Metabolomics spectral formatting, alignment and conversion tools (MSFACTs) , 2003, Bioinform..

[23]  Carl E. Rasmussen,et al.  The Infinite Gaussian Mixture Model , 1999, NIPS.

[24]  William H. Lane,et al.  Stable Marriage Problem , 2001 .

[25]  Yue Joseph Wang,et al.  Multi-profile Bayesian alignment model for LC-MS data analysis with integration of internal standards , 2013, Bioinform..

[26]  Age K. Smilde,et al.  Optimized time alignment algorithm for LC-MS data: correlation optimized warping using component detection algorithm-selected mass chromatograms. , 2008, Analytical chemistry.

[27]  Benno Schwikowski,et al.  Alignment of LC‐MS images, with applications to biomarker discovery and protein identification , 2008, Proteomics.