MetTailor: dynamic block summary and intensity normalization for robust analysis of mass spectrometry data in metabolomics

MOTIVATION Accurate cross-sample peak alignment and reliable intensity normalization is a critical step for robust quantitative analysis in untargetted metabolomics since tandem mass spectrometry (MS/MS) is rarely used for compound identification. Therefore shortcomings in the data processing steps can easily introduce false positives due to misalignments and erroneous normalization adjustments in large sample studies. RESULTS In this work, we developed a software package MetTailor featuring two novel data preprocessing steps to remedy drawbacks in the existing processing tools. First, we propose a novel dynamic block summarization (DBS) method for correcting misalignments from peak alignment algorithms, which alleviates missing data problem due to misalignments. For the purpose of verifying correct re-alignments, we propose to use the cross-sample consistency in isotopic intensity ratios as a quality metric. Second, we developed a flexible intensity normalization procedure that adjusts normalizing factors against the temporal variations in total ion chromatogram (TIC) along the chromatographic retention time (RT). We first evaluated the DBS algorithm using a curated metabolomics dataset, illustrating that the algorithm identifies misaligned peaks and correctly realigns them with good sensitivity. We next demonstrated the DBS algorithm and the RT-based normalization procedure in a large-scale dataset featuring >100 sera samples in primary Dengue infection study. Although the initial alignment was successful for the majority of peaks, the DBS algorithm still corrected ∼7000 misaligned peaks in this data and many recovered peaks showed consistent isotopic patterns with the peaks they were realigned to. In addition, the RT-based normalization algorithm efficiently removed visible local variations in TIC along the RT, without sacrificing the sensitivity of detecting differentially expressed metabolites. AVAILABILITY AND IMPLEMENTATION The R package MetTailor is freely available at the SourceForge website http://mettailor.sourceforge.net/. CONTACT hyung_won_choi@nuhs.edu.sg SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.

[1]  Nell Sedransk,et al.  Improved Normalization of Systematic Biases Affecting Ion Current Measurements in Label-free Proteomics Data* , 2014, Molecular & Cellular Proteomics.

[2]  William J. Griffiths,et al.  Mass spectrometry: from proteomics to metabolomics and lipidomics. , 2009, Chemical Society reviews.

[3]  Hongzhe Li,et al.  A Markov random field model for network-based analysis of genomic data , 2007, Bioinform..

[4]  R. Abagyan,et al.  XCMS: processing mass spectrometry data for metabolite profiling using nonlinear peak alignment, matching, and identification. , 2006, Analytical chemistry.

[5]  R. Abagyan,et al.  METLIN: A Metabolite Mass Spectral Database , 2005, Therapeutic drug monitoring.

[6]  J. Idle,et al.  Metabolomics reveals aging-associated attenuation of noninvasive radiation biomarkers in mice: potential role of polyamine catabolism and incoherent DNA damage-repair. , 2013, Journal of proteome research.

[7]  S. Neumann,et al.  CAMERA: an integrated strategy for compound spectra extraction and annotation of liquid chromatography/mass spectrometry data sets. , 2012, Analytical chemistry.

[8]  Russ B. Altman,et al.  Missing value estimation methods for DNA microarrays , 2001, Bioinform..

[9]  Matej Oresic,et al.  MZmine 2: Modular framework for processing, visualizing, and analyzing mass spectrometry-based molecular profile data , 2010, BMC Bioinformatics.

[10]  Ying Zhang,et al.  HMDB: the Human Metabolome Database , 2007, Nucleic Acids Res..

[11]  Steffen Neumann,et al.  Critical assessment of alignment procedures for LC-MS proteomics and metabolomics measurements , 2008, BMC Bioinformatics.

[12]  David S. Wishart,et al.  HMDB: a knowledgebase for the human metabolome , 2008, Nucleic Acids Res..

[13]  Deepayan Sarkar,et al.  Detecting differential gene expression with a semiparametric hierarchical mixture method. , 2004, Biostatistics.

[14]  Peter Bühlmann,et al.  MissForest - non-parametric missing value imputation for mixed-type data , 2011, Bioinform..

[15]  S. Tannenbaum,et al.  Serum Metabolome and Lipidome Changes in Adult Patients with Primary Dengue Infection , 2013, PLoS neglected tropical diseases.

[16]  M. Hirai,et al.  MassBank: a public repository for sharing mass spectral data for life sciences. , 2010, Journal of mass spectrometry : JMS.

[17]  B. Hammock,et al.  Mass spectrometry-based metabolomics. , 2007, Mass spectrometry reviews.

[18]  Matej Oresic,et al.  Normalization method for metabolomics data using optimal selection of multiple internal standards , 2007, BMC Bioinformatics.

[19]  Piotr S. Gromski,et al.  Influence of Missing Values Substitutes on Multivariate Analysis of Metabolomics Data , 2014, Metabolites.

[20]  Dan Ventura,et al.  LC-MS alignment in theory and practice: a comprehensive algorithmic review , 2013, Briefings Bioinform..