Metabolic profiling and the metabolome-wide association study: significance level for biomarker identification.

High throughput metabolic profiling via the metabolome-wide association study (MWAS) is a powerful new approach to identify biomarkers of disease risk, but there are methodological challenges: high dimensionality, high level of collinearity, the existence of peak overlap within metabolic spectral data, multiple testing, and selection of a suitable significance threshold. We define the metabolome-wide significance level (MWSL) as the threshold required to control the family wise error rate through a permutation approach. We used 1H NMR spectroscopic profiles of 24 h urinary collections from the INTERMAP study. Our results show that the MWSL primarily depends on sample size and spectral resolution. The MWSL estimates can be used to guide selection of discriminatory biomarkers in MWA studies. In a simulation study, we compare statistical performance of the MWSL approach to two variants of orthogonal partial least-squares (OPLS) method with respect to statistical power, false positive rate and correspondence of ranking of the most significant spectral variables. Our results show that the MWSL approach as estimated by the univariate t test is not outperformed by OPLS and offers a fast and simple method to detect disease-related discriminatory features in human NMR urinary metabolic profiles.

[1]  B Müller-Myhsok,et al.  Rapid simulation of P values for product methods and multiple-testing adjustment in association studies. , 2005, American journal of human genetics.

[2]  T. Hudson,et al.  A genome-wide association study identifies novel risk loci for type 2 diabetes , 2007, Nature.

[3]  Frank Dudbridge,et al.  Efficient computation of significance levels for multiple associations in large studies of correlated data, including genomewide association studies. , 2004, American journal of human genetics.

[4]  Johan Trygg,et al.  O2‐PLS, a two‐block (X–Y) latent variable regression (LVR) method with an integral OSC filter , 2003 .

[5]  J. Nicholson,et al.  Metabonomics in ulcerative colitis: diagnostics, biomarker identification, and insight into the pathophysiology. , 2010, Journal of proteome research.

[6]  R. Simes,et al.  An improved Bonferroni procedure for multiple tests of significance , 1986 .

[7]  H Kesteloot,et al.  INTERMAP: background, aims, design, methods, and descriptive statistics (nondietary) , 2003, Journal of Human Hypertension.

[8]  Y. Hochberg A sharper Bonferroni procedure for multiple tests of significance , 1988 .

[9]  H. Martens,et al.  Modified Jack-knife estimation of parameter uncertainty in bilinear modelling by partial least squares regression (PLSR) , 2000 .

[10]  S. Wold,et al.  Orthogonal projections to latent structures (O‐PLS) , 2002 .

[11]  Elaine Holmes,et al.  Metabolic Profiling of CSF: Evidence That Early Intervention May Impact on Disease Progression and Outcome in Schizophrenia , 2006, PLoS medicine.

[12]  Alex Lewin,et al.  On fuzzy familywise error rate and false discovery rate procedures for discrete distributions , 2009 .

[13]  C. Hoggart,et al.  Genome‐wide significance for dense SNP and resequencing data , 2008, Genetic epidemiology.

[14]  D. Rom A sequentially rejective test procedure based on a modified Bonferroni inequality , 1990 .

[15]  Ian J. Brown,et al.  Human metabolic phenotype diversity and its association with diet and blood pressure , 2008, Nature.

[16]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[17]  Jeremiah Stamler,et al.  Opening up the "Black Box": metabolic phenotyping and metabolome-wide association studies in epidemiology. , 2010, Journal of clinical epidemiology.

[18]  R. Shamir,et al.  A fast method for computing high-significance disease association in large population-based studies. , 2006, American journal of human genetics.