Analyzing metabolomics data for association with genotypes using two-component Gaussian mixture distributions

Standard approaches to evaluate the impact of single nucleotide polymorphisms (SNP) on quantitative phenotypes use linear models. However, these normal-based approaches may not optimally model phenotypes which are better represented by Gaussian mixture distributions (e.g., some metabolomics data). We develop a likelihood ratio test on the mixing proportions of two-component Gaussian mixture distributions and consider more restrictive models to increase power in light of a priori biological knowledge. Data were simulated to validate the improved power of the likelihood ratio test and the restricted likelihood ratio test over a linear model and a log transformed linear model. Then, using real data from the Framingham Heart Study, we analyzed 20,315 SNPs on chromosome 11, demonstrating that the proposed likelihood ratio test identifies SNPs well known to participate in the desaturation of certain fatty acids. Our study both validates the approach of increasing power by using the likelihood ratio test that leverages Gaussian mixture models, and creates a model with improved sensitivity and interpretability.

[1]  Sudha Seshadri,et al.  The Framingham Heart Study 100K SNP genome-wide association study resource: overview of 17 phenotype working group reports , 2007, BMC Medical Genetics.

[2]  J. Foreyt,et al.  Plasma concentrations of trans fatty acids in persons with type 2 diabetes between September 2002 and April 2004. , 2013, The American journal of clinical nutrition.

[3]  W. Harris,et al.  A genome-wide association study of saturated, mono- and polyunsaturated red blood cell fatty acids in the Framingham Heart Offspring Study. , 2015, Prostaglandins, leukotrienes, and essential fatty acids.

[4]  Ralph B D'Agostino,et al.  Genetics of the Framingham Heart Study population. , 2008, Advances in genetics.

[5]  P. Visscher,et al.  10 Years of GWAS Discovery: Biology, Function, and Translation. , 2017, American journal of human genetics.

[6]  M. Fornage,et al.  Genetic Loci Associated with Plasma Phospholipid n-3 Fatty Acids: A Meta-Analysis of Genome-Wide Association Studies from the CHARGE Consortium , 2011, PLoS genetics.

[7]  J. Spertus,et al.  Red Blood Cell Fatty Acid Patterns and Acute Coronary Syndrome , 2009, PloS one.

[8]  C. Gieger,et al.  Human metabolic individuality in biomedical and pharmaceutical research , 2011, Nature.

[9]  Kenny Q. Ye,et al.  Computing Power and Sample Size for Case-Control Association Studies with Copy Number Polymorphism: Application of Mixture-Based Likelihood Ratio Test , 2008, PloS one.

[10]  K. Lunetta,et al.  Methods in Genetics and Clinical Interpretation Cohorts for Heart and Aging Research in Genomic Epidemiology (CHARGE) Consortium Design of Prospective Meta-Analyses of Genome-Wide Association Studies From 5 Cohorts , 2010 .

[11]  J. Newman,et al.  A novel approach to identify optimal metabotypes of elongase and desaturase activities in prevention of acute coronary syndrome , 2015, Metabolomics.

[12]  R. Vasan,et al.  Clinical correlates and heritability of erythrocyte eicosapentaenoic and docosahexaenoic acid content in the Framingham Heart Study. , 2012, Atherosclerosis.