A benchmark spike‐in data set for biomarker identification in metabolomics

The development and the validation of innovative approaches for biomarker selection are of paramount importance in many ‐omics technologies. Unfortunately, the actual testing of new methods on real data is difficult, because in real data sets, one can never be sure about the “true” biomarkers. In this paper, we present a publicly available metabolomic ultra performance liquid chromatography–mass spectrometry spike‐in data set for apples. The data set consists of 10 control samples and three spiked sets of the same size, where naturally occurring compounds are added in different concentrations. In this sense, the data set can serve as a test bed to assess the performance of new algorithms and compare them with previously published results.

[1]  William Stafford Noble,et al.  How does multiple testing correction work? , 2009, Nature Biotechnology.

[2]  S. Wold,et al.  PLS-regression: a basic tool of chemometrics , 2001 .

[3]  G. Church,et al.  Preferred analysis methods for Affymetrix GeneChips revealed by a wholly defined control dataset , 2005, Genome Biology.

[4]  R. Abagyan,et al.  XCMS: processing mass spectrometry data for metabolite profiling using nonlinear peak alignment, matching, and identification. , 2006, Analytical chemistry.

[5]  Ron Wehrens,et al.  Stability-based biomarker selection. , 2011, Analytica chimica acta.

[6]  Sabina Passamonti,et al.  Exceptionally fast uptake and metabolism of cyanidin 3-glucoside by rat kidneys and liver. , 2011, Journal of natural products.

[7]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[8]  R. Irizarry,et al.  Consolidated strategy for the analysis of microarray spike-in data , 2008, Nucleic acids research.

[9]  Fulvio Mattivi,et al.  Quantitation of polyphenols in different apple varieties. , 2004, Journal of agricultural and food chemistry.

[10]  Matthew A. Hibbs,et al.  Visualization of omics data for systems biology , 2010, Nature Methods.

[11]  S. Neumann,et al.  CAMERA: an integrated strategy for compound spectra extraction and annotation of liquid chromatography/mass spectrometry data sets. , 2012, Analytical chemistry.

[12]  Julian L. Griffin Ask not what data standards can do for you but what you can do for data standards: a personal view of reporting standardisation in metabolomic experiments , 2011, Metabolomics.

[13]  R. Brouillard,et al.  Flavonoids and flower colour , 1988 .

[14]  C. Jun,et al.  Performance of some variable selection methods when multicollinearity is present , 2005 .

[15]  E. Want,et al.  Global metabolic profiling procedures for urine using UPLC–MS , 2010, Nature Protocols.

[16]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .