RAMClust: a novel feature clustering method enables spectral-matching-based annotation for metabolomics data.

Metabolomic data are frequently acquired using chromatographically coupled mass spectrometry (MS) platforms. For such datasets, the first step in data analysis relies on feature detection, where a feature is defined by a mass and retention time. While a feature typically is derived from a single compound, a spectrum of mass signals is more a more-accurate representation of the mass spectrometric signal for a given metabolite. Here, we report a novel feature grouping method that operates in an unsupervised manner to group signals from MS data into spectra without relying on predictability of the in-source phenomenon. We additionally address a fundamental bottleneck in metabolomics, annotation of MS level signals, by incorporating indiscriminant MS/MS (idMS/MS) data implicitly: feature detection is performed on both MS and idMS/MS data, and feature-feature relationships are determined simultaneously from the MS and idMS/MS data. This approach facilitates identification of metabolites using in-source MS and/or idMS/MS spectra from a single experiment, reduces quantitative analytical variation compared to single-feature measures, and decreases false positive annotations of unpredictable phenomenon as novel compounds. This tool is released as a freely available R package, called RAMClustR, and is sufficiently versatile to group features from any chromatographic-spectrometric platform or feature-finding software.

[1]  Asaph Aharoni,et al.  Evaluation of peak picking quality in LC-MS metabolomics data. , 2010, Analytical chemistry.

[2]  M. Hirai,et al.  MassBank: a public repository for sharing mass spectral data for life sciences. , 2010, Journal of mass spectrometry : JMS.

[3]  J. Fenn,et al.  Electrospray interface for liquid chromatographs and mass spectrometers. , 1985, Analytical chemistry.

[4]  Bin Zhang,et al.  Defining clusters from a hierarchical cluster tree: the Dynamic Tree Cut package for R , 2008, Bioinform..

[5]  Daniel Müllner,et al.  fastcluster: Fast Hierarchical, Agglomerative Clustering Routines for R and Python , 2013 .

[6]  R. Abagyan,et al.  XCMS: processing mass spectrometry data for metabolite profiling using nonlinear peak alignment, matching, and identification. , 2006, Analytical chemistry.

[7]  Carolyn J. Broccardo,et al.  Proteomic Characterization of Equine Cerebrospinal Fluid , 2014 .

[8]  I. Wilson,et al.  UPLC/MS(E); a new approach for generating molecular fragment information for biomarker structure elucidation. , 2006, Rapid communications in mass spectrometry : RCM.

[9]  Y. M. Tikunov,et al.  MSClust: a tool for unsupervised mass spectra extraction of chromatography-mass spectrometry ion-wise aligned data , 2011, Metabolomics.

[10]  E. Ingelsson,et al.  Assigning precursor–product ion relationships in indiscriminant MS/MS data from non-targeted metabolite profiling studies , 2012, Metabolomics.

[11]  S. Neumann,et al.  CAMERA: an integrated strategy for compound spectra extraction and annotation of liquid chromatography/mass spectrometry data sets. , 2012, Analytical chemistry.

[12]  S. Stein,et al.  Deconvolution gas chromatography/mass spectrometry of urinary organic acids--potential for pattern recognition and automated identification of metabolic disorders. , 1999, Rapid communications in mass spectrometry : RCM.