Addressing the batch effect issue for LC/MS metabolomics data in data preprocessing

With the growth of metabolomics research, more and more studies are conducted on large numbers of samples. Due to technical limitations of the Liquid Chromatography–Mass Spectrometry (LC/MS) platform, samples often need to be processed in multiple batches. Across different batches, we often observe differences in data characteristics. In this work, we specifically focus on data generated in multiple batches on the same LC/MS machinery. Traditional preprocessing methods treat all samples as a single group. Such practice can result in errors in the alignment of peaks, which cannot be corrected by post hoc application of batch effect correction methods. In this work, we developed a new approach that address the batch effect issue in the preprocessing stage, resulting in better peak detection, alignment and quantification. It can be combined with down-stream batch effect correction methods to further correct for between-batch intensity differences. The method is implemented in the existing workflow of the apLCMS platform. Analyzing data with multiple batches, both generated from standardized quality control (QC) plasma samples and from real biological studies, the new method resulted in feature tables with better consistency, as well as better down-stream analysis results. The method can be a useful addition to the tools available for large studies involving multiple batches. The method is available as part of the apLCMS package. Download link and instructions are at https://mypage.cuhk.edu.cn/academics/yutianwei/apLCMS/.

[1]  Hiroki Takahashi,et al.  AMDORAP: Non-targeted metabolic profiling based on high-resolution LC-MS , 2011, BMC Bioinformatics.

[2]  S. Neumann,et al.  CAMERA: an integrated strategy for compound spectra extraction and annotation of liquid chromatography/mass spectrometry data sets. , 2012, Analytical chemistry.

[3]  Steffen Neumann,et al.  Highly sensitive feature detection for high resolution LC/MS , 2008, BMC Bioinformatics.

[4]  Zhentian Lei,et al.  MET-COFEA: a liquid chromatography/mass spectrometry data processing platform for metabolite compound feature extraction and annotation. , 2014, Analytical chemistry.

[5]  Juan Daniel Sanjuan-Herráez,et al.  Evaluation of batch effect elimination using quality control replicates in LC-MS metabolite profiling. , 2018, Analytica chimica acta.

[6]  John L Markley,et al.  Metabolite identification via the Madison Metabolomics Consortium Database , 2008, Nature Biotechnology.

[7]  Victor Treviño,et al.  GridMass: a fast two-dimensional feature detection method for LC/MS. , 2015, Journal of mass spectrometry : JMS.

[8]  Shuzhao Li,et al.  Predicting Network Activity from High Throughput Metabolomics , 2013, PLoS Comput. Biol..

[9]  Cheng Li,et al.  Adjusting batch effects in microarray expression data using empirical Bayes methods. , 2007, Biostatistics.

[10]  Marion Kee,et al.  Analysis , 2004, Machine Translation.

[11]  Jan Lisec,et al.  Compound annotation in liquid chromatography/high-resolution mass spectrometry based metabolomics: robust adduct ion determination as a prerequisite to structure prediction in electrospray ionization mass spectra. , 2017, Rapid communications in mass spectrometry : RCM.

[12]  Ming-Huei Chen,et al.  Metabolomic Profiles of Body Mass Index in the Framingham Heart Study Reveal Distinct Cardiometabolic Phenotypes , 2016, PloS one.

[13]  John J. Thaden,et al.  An iterative block-shifting approach to retention time alignment that preserves the shape and area of gas chromatography-mass spectrometry peaks , 2008, BMC Bioinformatics.

[14]  Tianwei Yu,et al.  apLCMS - adaptive processing of high-resolution LC/MS data , 2009, Bioinform..

[15]  Tianwei Yu,et al.  scBatch: batch-effect correction of RNA-seq data through sample distance matrix adjustment , 2020, Bioinform..

[16]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[17]  Johan Lindberg,et al.  Feature detection and alignment of hyphenated chromatographic-mass spectrometric data. Extraction of pure ion chromatograms using Kalman tracking. , 2008, Journal of chromatography. A.

[18]  Kui Deng,et al.  NormAE: Deep Adversarial Learning Model to Remove Batch Effects in Liquid Chromatography Mass Spectrometry-Based Metabolomics Data. , 2020, Analytical chemistry.

[19]  Z. Cai,et al.  statTarget: A streamlined tool for signal drift correction and interpretations of quantitative mass spectrometry-based omics data. , 2018, Analytica chimica acta.

[20]  Yi-Zeng Liang,et al.  Nonlinear alignment of chromatograms by means of moving window fast Fourier transfrom cross-correlation. , 2013, Journal of separation science.

[21]  Lin Shi,et al.  Large-scale untargeted LC-MS metabolomics data correction using between-batch feature alignment and cluster-based within-batch signal intensity drift correction , 2016, Metabolomics.

[22]  R. Abagyan,et al.  METLIN: A Metabolite Mass Spectral Database , 2005, Therapeutic drug monitoring.

[23]  R. Abagyan,et al.  XCMS: processing mass spectrometry data for metabolite profiling using nonlinear peak alignment, matching, and identification. , 2006, Analytical chemistry.

[24]  G. Gibson,et al.  A Longitudinal Study of Health Improvement in the Atlanta CHDWB Wellness Cohort , 2014, Journal of personalized medicine.

[25]  Christoph Steinbeck,et al.  Navigating freely-available software tools for metabolomics analysis , 2017, Metabolomics.

[26]  Dean P. Jones,et al.  Hybrid feature detection and information accumulation using high-resolution LC-MS metabolomics data. , 2013, Journal of proteome research.

[27]  Eoin Fahy,et al.  Metabolomics Workbench: An international repository for metabolomics data and metadata, metabolite standards, protocols, tutorials and training, and analysis tools , 2015, Nucleic Acids Res..

[28]  Martin T. Wells,et al.  RRmix: A method for simultaneous batch effect correction and analysis of metabolomics data in the absence of internal standards , 2017, PloS one.

[29]  O. Fiehn,et al.  Systematic Error Removal Using Random Forest for Normalizing Large-Scale Untargeted Lipidomics Data. , 2019, Analytical chemistry.

[30]  Tianwei Yu,et al.  Batch Effect Correction of RNA-seq Data through Sample Distance Matrix Adjustment , 2019, bioRxiv.

[31]  J. Lindberg,et al.  Second-order peak detection for multicomponent high-resolution LC/MS data. , 2006, Analytical chemistry.

[32]  Joshua D. Knowles,et al.  Procedures for large-scale metabolic profiling of serum and plasma using gas chromatography and liquid chromatography coupled to mass spectrometry , 2011, Nature Protocols.

[33]  Karan Uppal,et al.  xMSannotator: An R Package for Network-Based Annotation of High-Resolution Metabolomics Data. , 2017, Analytical chemistry.

[34]  Tianwei Yu,et al.  xMSanalyzer: automated pipeline for improved feature detection and downstream analysis of large-scale, non-targeted metabolomics data , 2013, BMC Bioinformatics.

[35]  J. Kuligowski,et al.  Intra-batch effect correction in liquid chromatography-mass spectrometry using quality control samples and support vector regression (QC-SVRC). , 2015, The Analyst.

[36]  David S. Wishart,et al.  HMDB: a knowledgebase for the human metabolome , 2008, Nucleic Acids Res..

[37]  Prasenjit Manna,et al.  Phosphatidylinositol-3,4,5-Triphosphate and Cellular Signaling: Implications for Obesity and Diabetes , 2015, Cellular Physiology and Biochemistry.

[38]  Jian Ji,et al.  Software Tools and Approaches for Compound Identification of LC-MS/MS Data in Metabolomics , 2018, Metabolites.

[39]  Steffen Neumann,et al.  IPO: a tool for automated optimization of XCMS parameters , 2015, BMC Bioinformatics.

[40]  Matej Oresic,et al.  MZmine: toolbox for processing and visualization of mass spectrometry based molecular profile data , 2006, Bioinform..

[41]  J A Kirwan,et al.  Characterising and correcting batch variation in an automated direct infusion mass spectrometry (DIMS) metabolomics workflow , 2013, Analytical and Bioanalytical Chemistry.

[42]  Fan Zhang,et al.  WaveICA: A novel algorithm to remove batch effects for large-scale untargeted metabolomics data based on wavelet analysis. , 2019, Analytica chimica acta.