Automated Label-free Quantification of Metabolites from Liquid Chromatography–Mass Spectrometry Data*

Liquid chromatography coupled to mass spectrometry (LC-MS) has become a standard technology in metabolomics. In particular, label-free quantification based on LC-MS is easily amenable to large-scale studies and thus well suited to clinical metabolomics. Large-scale studies, however, require automated processing of the large and complex LC-MS datasets. We present a novel algorithm for the detection of mass traces and their aggregation into features (i.e. all signals caused by the same analyte species) that is computationally efficient and sensitive and that leads to reproducible quantification results. The algorithm is based on a sensitive detection of mass traces, which are then assembled into features based on mass-to-charge spacing, co-elution information, and a support vector machine–based classifier able to identify potential metabolite isotope patterns. The algorithm is not limited to metabolites but is applicable to a wide range of small molecules (e.g. lipidomics, peptidomics), as well as to other separation technologies. We assessed the algorithm's robustness with regard to varying noise levels on synthetic data and then validated the approach on experimental data investigating human plasma samples. We obtained excellent results in a fully automated data-processing pipeline with respect to both accuracy and reproducibility. Relative to state-of-the art algorithms, ours demonstrated increased precision and recall of the method. The algorithm is available as part of the open-source software package OpenMS and runs on all major operating systems.

[1]  Jacob D. Jaffe,et al.  MapQuant: Open‐source software for large‐scale protein quantification , 2006, Proteomics.

[2]  Johan Lindberg,et al.  Feature detection and alignment of hyphenated chromatographic-mass spectrometric data. Extraction of pure ion chromatograms using Kalman tracking. , 2008, Journal of chromatography. A.

[3]  Tianwei Yu,et al.  apLCMS - adaptive processing of high-resolution LC/MS data , 2009, Bioinform..

[4]  Knut Reinert,et al.  MSSimulator: Simulation of mass spectrometry data. , 2011, Journal of proteome research.

[5]  Knut Reinert,et al.  TOPPAS: a graphical workflow editor for the analysis of high-throughput proteomics data. , 2012, Journal of proteome research.

[6]  W. Cleveland,et al.  Locally Weighted Regression: An Approach to Regression Analysis by Local Fitting , 1988 .

[7]  Martin Hermansson,et al.  Automated quantitative analysis of complex lipidomes by liquid chromatography/mass spectrometry. , 2005, Analytical chemistry.

[8]  T. Rejtar,et al.  A universal denoising and peak picking algorithm for LC-MS based on matched filtration in the chromatographic time domain. , 2003, Analytical chemistry.

[9]  R. Abagyan,et al.  XCMS: processing mass spectrometry data for metabolite profiling using nonlinear peak alignment, matching, and identification. , 2006, Analytical chemistry.

[10]  Robert Burke,et al.  ProteoWizard: open source software for rapid proteomics tools development , 2008, Bioinform..

[11]  Jeffrey W. Smith,et al.  Mass Spectrometry-Based Label-Free Quantitative Proteomics , 2009, Journal of biomedicine & biotechnology.

[12]  Knut Reinert,et al.  OpenMS – An open-source software framework for mass spectrometry , 2008, BMC Bioinformatics.

[13]  Oliver Fiehn,et al.  A comprehensive urinary metabolomic approach for identifying kidney cancerr. , 2007, Analytical biochemistry.

[14]  Oliver Fiehn,et al.  Seven Golden Rules for heuristic filtering of molecular formulas obtained by accurate mass spectrometry , 2007, BMC Bioinformatics.

[15]  Alexander Dekhtyar,et al.  Information Retrieval , 2018, Lecture Notes in Computer Science.

[16]  Lothar Willmitzer,et al.  Elemental formula annotation of polar and lipophilic metabolites using (13) C, (15) N and (34) S isotope labelling, in combination with high-resolution mass spectrometry. , 2011, The Plant journal : for cell and molecular biology.

[17]  Matej Oresic,et al.  MZmine 2: Modular framework for processing, visualizing, and analyzing mass spectrometry-based molecular profile data , 2010, BMC Bioinformatics.

[18]  G. Siuzdak,et al.  XCMS2: processing tandem mass spectrometry data for metabolite identification and structural characterization. , 2008, Analytical chemistry.

[19]  Knut Reinert,et al.  TOPP - the OpenMS proteomics pipeline , 2007, Bioinform..

[20]  Joshua D Rabinowitz,et al.  Metabolomic analysis and visualization engine for LC-MS data. , 2010, Analytical chemistry.

[21]  S. Neumann,et al.  CAMERA: an integrated strategy for compound spectra extraction and annotation of liquid chromatography/mass spectrometry data sets. , 2012, Analytical chemistry.

[22]  Matej Oresic,et al.  Processing methods for differential analysis of LC/MS profile data , 2005, BMC Bioinformatics.

[23]  M. Orešič,et al.  Data processing for mass spectrometry-based metabolomics. , 2007, Journal of chromatography. A.

[24]  Knut Reinert,et al.  OpenMS and TOPP: open source software for LC-MS data analysis. , 2011, Methods in molecular biology.

[25]  A. Rockwood,et al.  Efficient calculation of accurate masses of isotopic peaks , 2006, Journal of the American Society for Mass Spectrometry.

[26]  Alan R. Dabney,et al.  Elimination of systematic mass measurement errors in liquid chromatography-mass spectrometry based proteomics using regression models and a priori partial knowledge of the sample content. , 2008, Analytical chemistry.

[27]  Lennart Martens,et al.  mzML—a Community Standard for Mass Spectrometry Data* , 2010, Molecular & Cellular Proteomics.

[28]  Pei Wang,et al.  Bioinformatics Original Paper a Suite of Algorithms for the Comprehensive Analysis of Complex Protein Mixtures Using High-resolution Lc-ms , 2022 .

[29]  Thorsten Meinl,et al.  KNIME - the Konstanz information miner: version 2.0 and beyond , 2009, SKDD.

[30]  Bernhard Kluger,et al.  MetExtract: a new software tool for the automated comprehensive extraction of metabolite-derived LC/MS signals in metabolomics research , 2012, Bioinform..

[31]  Sanjoy Dasgupta,et al.  On-Line Estimation with the Multivariate Gaussian Distribution , 2007, COLT.

[32]  Steffen Neumann,et al.  Annotation of LC/ESI-MS Mass Signals , 2007, BIRD.

[33]  M. Senko,et al.  Determination of monoisotopic masses and ion populations for large biomolecules from resolved isotopic distributions , 1995, Journal of the American Society for Mass Spectrometry.

[34]  Ying Zhang,et al.  HMDB: the Human Metabolome Database , 2007, Nucleic Acids Res..

[35]  M. Mann,et al.  MaxQuant enables high peptide identification rates, individualized p.p.b.-range mass accuracies and proteome-wide protein quantification , 2008, Nature Biotechnology.