Fusion of Quality Evaluation Metrics and Convolutional Neural Network Representations for ROI Filtering in LC-MS.

Region of interest (ROI) extraction is a fundamental step in analyzing metabolomic datasets acquired by liquid chromatography-mass spectrometry (LC-MS). However, noises and backgrounds in LC-MS data often affect the quality of extracted ROIs. Therefore, developing effective ROI evaluation algorithms is necessary to eliminate false positives meanwhile keep the false-negative rate as low as possible. In this study, a deep fused filter of ROIs (dffROI) was proposed to improve the accuracy of ROI extraction by combining the handcrafted evaluation metrics with convolutional neural network (CNN)-learned representations. To evaluate the performance of dffROI, dffROI was compared with peakonly (CNN-learned representation) and five handcrafted metrics on three LC-MS datasets and a gas chromatography-mass spectrometry (GC-MS) dataset. Results show that dffROI can achieve higher accuracy, better true-positive rate, and lower false-positive rate. Its accuracy, true-positive rate, and false-positive rate are 0.9841, 0.9869, and 0.0186 on the test set, respectively. The classification error rate of dffROI (1.59%) is significantly reduced compared with peakonly (2.73%). The model-agnostic feature importance demonstrates the necessity of fusing handcrafted evaluation metrics with the convolutional neural network representations. dffROI is an automatic, robust, and universal method for ROI filtering by virtue of information fusion and end-to-end learning. It is implemented in Python programming language and open-sourced at https://github.com/zhanghailiangcsu/dffROI under BSD License. Furthermore, it has been integrated into the KPIC2 framework previously proposed by our group to facilitate real metabolomic LC-MS dataset analysis.

[1]  Zhen Xu,et al.  Fully automatic resolution of untargeted GC-MS data with deep learning assistance. , 2022, Talanta.

[2]  B. Debus,et al.  Deep learning in analytical chemistry , 2021, TrAC Trends in Analytical Chemistry.

[3]  Zhimin Zhang,et al.  Retention time prediction in hydrophilic interaction liquid chromatography with graph neural network and transfer learning. , 2021, Journal of chromatography. A.

[4]  K. Shaari,et al.  LC-MS metabolomics analysis of Stevia rebaudiana Bertoni leaves cultivated in Malaysia in relation to different developmental stages. , 2021, Phytochemical analysis : PCA.

[5]  Frank Chen,et al.  EVA: Evaluation of Metabolic Feature Fidelity Using a Deep Learning Model Trained With Over 25000 Extracted Ion Chromatograms. , 2021, Analytical chemistry.

[6]  Zhimin Zhang,et al.  Prediction of Liquid Chromatographic Retention Time with Graph Neural Networks to Assist in Small Molecule Identification. , 2021, Analytical chemistry.

[7]  Zhimin Zhang,et al.  Deep-Learning-Assisted multivariate curve resolution. , 2020, Journal of chromatography. A.

[8]  Kwanjeera Wanichthanarak,et al.  Deep metabolome: Applications of deep learning in metabolomics , 2020, Computational and structural biotechnology journal.

[9]  D. Matyushin,et al.  Deep Learning Driven GC-MS Library Search and Its Application for Metabolomics. , 2020, Analytical chemistry.

[10]  Gaurav Pandey,et al.  MetaClean: a machine learning-based classifier for reduced false positive peak detection in untargeted LC–MS metabolomics data , 2020, Metabolomics.

[11]  Lars M Blank,et al.  Machine Learning Applications for Mass Spectrometry-Based Metabolomics , 2020, Metabolites.

[12]  V. Yanshole,et al.  Deep learning for the precise peak detection in high-resolution LC-MS data. , 2019, Analytical chemistry.

[13]  Susan Cheng,et al.  Deep Neural Networks for Classification of LC-MS Spectral Peaks. , 2019, Analytical chemistry.

[14]  Yann Guitton,et al.  WiPP: Workflow for Improved Peak Picking for Gas Chromatography-Mass Spectrometry (GC-MS) Data , 2019, bioRxiv.

[15]  Sandrine Dudoit,et al.  Filtering procedures for untargeted LC-MS metabolomics data , 2019, BMC Bioinformatics.

[16]  Ian D. Wilson,et al.  Guidelines and considerations for the use of system suitability and quality control samples in mass spectrometry assays applied in untargeted clinical metabolomic studies , 2018, Metabolomics : Official journal of the Metabolomic Society.

[17]  Rong Wang,et al.  Fast pure ion chromatograms extraction method for LC-MS , 2017 .

[18]  Shuzhao Li,et al.  Detailed Investigation and Comparison of the XCMS and MZmine 2 Chromatogram Construction and Chromatographic Peak Detection Methods for Preprocessing Mass Spectrometry Metabolomics Data. , 2017, Analytical chemistry.

[19]  Zhimin Zhang,et al.  KPIC2: An Effective Framework for Mass Spectrometry-Based Metabolomics Using Pure Ion Chromatograms. , 2017, Analytical chemistry.

[20]  Yizeng Liang,et al.  Chemometric methods in data processing of mass spectrometry-based metabolomics: A review. , 2016, Analytica chimica acta.

[21]  Yufeng J Tseng,et al.  Ion trace detection algorithm to extract pure ion chromatograms to improve untargeted peak detection quality for liquid chromatography/time-of-flight mass spectrometry-based metabolomics data. , 2015, Analytical chemistry.

[22]  Adam P. Arkin,et al.  Interactive XCMS Online: Simplifying Advanced Metabolomic Data Processing and Subsequent Statistical Analyses , 2014, Analytical chemistry.

[23]  Johan Lindberg,et al.  TracMass 2--a modular suite of tools for processing chromatography-full scan mass spectrometry data. , 2014, Analytical chemistry.

[24]  Asaph Aharoni,et al.  Evaluation of peak picking quality in LC-MS metabolomics data. , 2010, Analytical chemistry.

[25]  Matej Oresic,et al.  MZmine 2: Modular framework for processing, visualizing, and analyzing mass spectrometry-based molecular profile data , 2010, BMC Bioinformatics.

[26]  E. Want,et al.  Global metabolic profiling procedures for urine using UPLC–MS , 2010, Nature Protocols.

[27]  Steffen Neumann,et al.  Highly sensitive feature detection for high resolution LC/MS , 2008, BMC Bioinformatics.

[28]  Johan Lindberg,et al.  Feature detection and alignment of hyphenated chromatographic-mass spectrometric data. Extraction of pure ion chromatograms using Kalman tracking. , 2008, Journal of chromatography. A.

[29]  B. Hammock,et al.  Mass spectrometry-based metabolomics. , 2007, Mass spectrometry reviews.

[30]  J. Lindberg,et al.  Second-order peak detection for multicomponent high-resolution LC/MS data. , 2006, Analytical chemistry.

[31]  Hua Lin,et al.  Quantifying reproducibility for differential proteomics: noise analysis for protein liquid chromatography-mass spectrometry of human serum , 2004, Bioinform..