A dynamic programming approach for the alignment of signal peaks in multiple gas chromatography-mass spectrometry experiments

BackgroundGas chromatography-mass spectrometry (GC-MS) is a robust platform for the profiling of certain classes of small molecules in biological samples. When multiple samples are profiled, including replicates of the same sample and/or different sample states, one needs to account for retention time drifts between experiments. This can be achieved either by the alignment of chromatographic profiles prior to peak detection, or by matching signal peaks after they have been extracted from chromatogram data matrices. Automated retention time correction is particularly important in non-targeted profiling studies.ResultsA new approach for matching signal peaks based on dynamic programming is presented. The proposed approach relies on both peak retention times and mass spectra. The alignment of more than two peak lists involves three steps: (1) all possible pairs of peak lists are aligned, and similarity of each pair of peak lists is estimated; (2) the guide tree is built based on the similarity between the peak lists; (3) peak lists are progressively aligned starting with the two most similar peak lists, following the guide tree until all peak lists are exhausted. When two or more experiments are performed on different sample states and each consisting of multiple replicates, peak lists within each set of replicate experiments are aligned first (within-state alignment), and subsequently the resulting alignments are aligned themselves (between-state alignment). When more than two sets of replicate experiments are present, the between-state alignment also employs the guide tree. We demonstrate the usefulness of this approach on GC-MS metabolic profiling experiments acquired on wild-type and mutant Leishmania mexicana parasites.ConclusionWe propose a progressive method to match signal peaks across multiple GC-MS experiments based on dynamic programming. A sensitive peak similarity function is proposed to balance peak retention time and peak mass spectra similarities. This approach can produce the optimal alignment between an arbitrary number of peak lists, and models explicitly within-state and between-state peak alignment. The accuracy of the proposed method was close to the accuracy of manually-curated peak matching, which required tens of man-hours for the analyzed data sets. The proposed approach may offer significant advantages for processing of high-throughput metabolomics data, especially when large numbers of experimental replicates and multiple sample states are analyzed.

[1]  J. Selbig,et al.  Parallel analysis of transcript and metabolic profiles: a new approach in systems biology , 2003, EMBO reports.

[2]  Donald G Robertson,et al.  Metabonomics in toxicology: a review. , 2005, Toxicological sciences : an official journal of the Society of Toxicology.

[3]  R. Doolittle,et al.  Progressive sequence alignment as a prerequisitetto correct phylogenetic trees , 2007, Journal of Molecular Evolution.

[4]  Michael P. Barrett,et al.  Genetic characterization of glucose transporter function in Leishmania mexicana , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[5]  Jian Yang,et al.  Metabolomics spectral formatting, alignment and conversion tools (MSFACTs) , 2003, Bioinform..

[6]  A. Fernie,et al.  Metabolite profiling: from diagnostics to systems biology , 2004, Nature Reviews Molecular Cell Biology.

[7]  R. Abagyan,et al.  XCMS: processing mass spectrometry data for metabolite profiling using nonlinear peak alignment, matching, and identification. , 2006, Analytical chemistry.

[8]  L. Willmitzer,et al.  Towards dissecting nutrient metabolism in plants: a systems biology case study on sulphur metabolism. , 2004, Journal of experimental botany.

[9]  Corey D Broeckling,et al.  MET-IDEA: data extraction tool for mass spectrometry-based metabolomics. , 2006, Analytical chemistry.

[10]  K. Markides,et al.  Chromatographic alignment by warping and dynamic programming as a pre-processing tool for PARAFAC modelling of liquid chromatography-mass spectrometry data. , 2002, Journal of chromatography. A.

[11]  Rolf Danielsson,et al.  Alignment of chromatographic profiles for principal component analysis: a prerequisite for fingerprinting methods , 1994 .

[12]  Yury Tikunov,et al.  A Novel Approach for Nontargeted Data Analysis for Metabolomics. Large-Scale Profiling of Tomato Fruit Volatiles1[w] , 2005, Plant Physiology.

[13]  Karl-Heinz Engel,et al.  A methodology for automated comparative analysis of metabolite profiling data , 2003 .

[14]  Alisdair R Fernie,et al.  Predictive Metabolic Engineering: A Goal for Systems Biology1 , 2003, Plant Physiology.

[15]  P. Fraser,et al.  Chemical derivatization and mass spectral libraries in metabolic profiling by GC/MS and LC/MS/MS. , 2005, Journal of experimental botany.

[16]  Mariusz Kowalczyk,et al.  A strategy for identifying differences in large series of metabolomic samples analyzed by GC/MS. , 2004, Analytical chemistry.

[17]  T. F. Moran,et al.  Characterization of normal human cells by pyrolysis gas chromatography mass spectrometry. , 1979, Biomedical mass spectrometry.

[18]  D. Kell,et al.  High-throughput classification of yeast mutants for functional genomics using metabolic footprinting , 2003, Nature Biotechnology.

[19]  J. Carstensen,et al.  Aligning of single and multiple wavelength chromatographic profiles for chemometric data analysis using correlation optimised warping , 1998 .

[20]  S. Altschul,et al.  A tool for multiple sequence alignment. , 1989, Proceedings of the National Academy of Sciences of the United States of America.

[21]  Christus,et al.  A General Method Applicable to the Search for Similarities in the Amino Acid Sequence of Two Proteins , 2022 .

[22]  B. W. Wright,et al.  High-speed peak matching algorithm for retention time alignment of gas chromatographic data for chemometric analysis. , 2003, Journal of chromatography. A.

[23]  S. Stein An integrated method for spectrum extraction and compound identification from gas chromatography/mass spectrometry data , 1999 .

[24]  Oliver Fiehn,et al.  Monolithic silica-based capillary reversed-phase liquid chromatography/electrospray mass spectrometry for plant metabolomics. , 2003, Analytical chemistry.

[25]  Masaru Tomita,et al.  MathDAMP: a package for differential analysis of metabolite profiles , 2006, BMC Bioinformatics.

[26]  P. Eilers Parametric time warping. , 2004, Analytical chemistry.

[27]  Mark P. Styczynski,et al.  Systematic identification of conserved metabolites in GC/MS data for metabolomics and biomarker discovery. , 2007, Analytical chemistry.

[28]  T. Hankemeier,et al.  Microbial metabolomics with gas chromatography/mass spectrometry. , 2006, Analytical chemistry.

[29]  O. Fiehn,et al.  Metabolite profiling for plant functional genomics , 2000, Nature Biotechnology.

[30]  H. Keun,et al.  Metabonomic modeling of drug toxicity. , 2006, Pharmacology & therapeutics.

[31]  M. Walsh,et al.  Metabolomics in human nutrition: opportunities and challenges. , 2005, The American journal of clinical nutrition.

[32]  Knut Reinert,et al.  Transformation and other factors of the peptide mass spectrometry pairwise peak-list comparison process , 2005, BMC Bioinformatics.

[33]  J. Thompson,et al.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[34]  Matej Oresic,et al.  Processing methods for differential analysis of LC/MS profile data , 2005, BMC Bioinformatics.

[35]  Malcolm J. McConville,et al.  Progressive peak clustering in GC-MS Metabolomic experiments applied to Leishmania parasites , 2006, Bioinform..

[36]  J. German,et al.  Metabolomics in practice: emerging knowledge to guide future dietetic advice toward individualized health. , 2005, Journal of the American Dietetic Association.

[37]  Benno Schwikowski,et al.  Signal Maps for Mass Spectrometry-based Comparative Proteomics* , 2006, Molecular & Cellular Proteomics.

[38]  Frederick P Roth,et al.  Metabolomic Identification of Novel Biomarkers of Myocardial Ischemia , 2005, Circulation.