WiPP: Workflow for Improved Peak Picking for Gas Chromatography-Mass Spectrometry (GC-MS) Data

Lack of reliable peak detection impedes automated analysis of large-scale gas chromatography-mass spectrometry (GC-MS) metabolomics datasets. Performance and outcome of individual peak-picking algorithms can differ widely depending on both algorithmic approach and parameters, as well as data acquisition method. Therefore, comparing and contrasting between algorithms is difficult. Here we present a workflow for improved peak picking (WiPP), a parameter optimising, multi-algorithm peak detection for GC-MS metabolomics. WiPP evaluates the quality of detected peaks using a machine learning-based classification scheme based on seven peak classes. The quality information returned by the classifier for each individual peak is merged with results from different peak detection algorithms to create one final high-quality peak set for immediate down-stream analysis. Medium- and low-quality peaks are kept for further inspection. By applying WiPP to standard compound mixes and a complex biological dataset, we demonstrate that peak detection is improved through the novel way to assign peak quality, an automated parameter optimisation, and results in integration across different embedded peak picking algorithms. Furthermore, our approach can provide an impartial performance comparison of different peak picking algorithms. WiPP is freely available on GitHub (https://github.com/bihealth/WiPP) under MIT licence.

[1]  T. Speed,et al.  Normalizing and integrating metabolomics data. , 2012, Analytical chemistry.

[2]  Shuzhao Li,et al.  One Step Forward for Reducing False Positive and False Negative Compound Identifications from Mass Spectrometry Metabolomics Data: New Algorithms for Constructing Extracted Ion Chromatograms and Detecting Chromatographic Peaks. , 2017, Analytical chemistry.

[3]  K. Burgess,et al.  Recent advances in liquid and gas chromatography methodology for extending coverage of the metabolome. , 2017, Current opinion in biotechnology.

[4]  Arjen Lommen,et al.  MetAlign: interface-driven, versatile metabolomics tool for hyphenated full-scan mass spectrometry data preprocessing. , 2009, Analytical chemistry.

[5]  Davide Chicco,et al.  Ten quick tips for machine learning in computational biology , 2017, BioData Mining.

[6]  S. Neumann,et al.  CAMERA: an integrated strategy for compound spectra extraction and annotation of liquid chromatography/mass spectrometry data sets. , 2012, Analytical chemistry.

[7]  Stefan Kempa,et al.  Maui-VIA: A User-Friendly Software for Visual Identification, Alignment, Correction, and Quantification of Gas Chromatography–Mass Spectrometry Data , 2015, Front. Bioeng. Biotechnol..

[8]  Wei Jia,et al.  ADAP-GC 3.2: Graphical Software Tool for Efficient Spectral Deconvolution of Gas Chromatography-High-Resolution Mass Spectrometry Metabolomics Data. , 2018, Journal of proteome research.

[9]  J. Sanabria,et al.  Predicting Adverse Outcomes in Chronic Kidney Disease Using Machine Learning Methods: Data from the Modification of Diet in Renal Disease , 2017 .

[10]  Sven Rahmann,et al.  Snakemake--a scalable bioinformatics workflow engine. , 2012, Bioinformatics.

[11]  Steffen Neumann,et al.  IPO: a tool for automated optimization of XCMS parameters , 2015, BMC Bioinformatics.

[12]  Ute Roessner,et al.  PyMS: a Python toolkit for processing of gas chromatography-mass spectrometry (GC-MS) data. Application and comparative study of selected tools , 2012, BMC Bioinformatics.

[13]  G. Weingart,et al.  metaMS: an open-source pipeline for GC-MS-based untargeted metabolomics. , 2014, Journal of chromatography. B, Analytical technologies in the biomedical and life sciences.

[14]  Matej Oresic,et al.  MZmine 2: Modular framework for processing, visualizing, and analyzing mass spectrometry-based molecular profile data , 2010, BMC Bioinformatics.

[15]  Joshua D. Knowles,et al.  Procedures for large-scale metabolic profiling of serum and plasma using gas chromatography and liquid chromatography coupled to mass spectrometry , 2011, Nature Protocols.

[16]  Christoph Steinbeck,et al.  MetaboLights—an open-access general-purpose repository for metabolomics studies and associated meta-data , 2012, Nucleic Acids Res..

[17]  Hong Zheng,et al.  Predictive diagnosis of major depression using NMR-based metabolomics and least-squares support vector machine. , 2017, Clinica chimica acta; international journal of clinical chemistry.

[18]  Habtom W. Ressom,et al.  GC-MS Based Plasma Metabolomics for Identification of Candidate Biomarkers for Hepatocellular Carcinoma in Egyptian Cohort , 2015, PloS one.

[19]  Marta Díaz,et al.  eRah: A Computational Tool Integrating Spectral Deconvolution and Alignment with Quantification and Identification of Metabolites in GC/MS-Based Metabolomics. , 2016, Analytical chemistry.

[20]  Zhiwei Zhou,et al.  Advancing the large-scale CCS database for metabolomics and lipidomics at the machine-learning era. , 2018, Current opinion in chemical biology.

[21]  S. Kempa,et al.  Decoding the dynamics of cellular metabolism and the action of 3-bromopyruvate and 2-deoxyglucose using pulsed stable isotope-resolved metabolomics , 2014, Cancer & metabolism.

[22]  Shuzhao Li,et al.  Detailed Investigation and Comparison of the XCMS and MZmine 2 Chromatogram Construction and Chromatographic Peak Detection Methods for Preprocessing Mass Spectrometry Metabolomics Data. , 2017, Analytical chemistry.

[23]  J. Coble,et al.  Comparative evaluation of preprocessing freeware on chromatography/mass spectrometry data for signature discovery. , 2014, Journal of chromatography. A.

[24]  R. Abagyan,et al.  XCMS: processing mass spectrometry data for metabolite profiling using nonlinear peak alignment, matching, and identification. , 2006, Analytical chemistry.