Statistical Applications in Genetics and Molecular Biology A Multiple Testing Approach to High-Dimensional Association Studies with an Application to the Detection of Associations between Risk Factors of Heart Disease and Genetic Polymorphisms

We present an approach to association studies involving a dozen or so `response' variables and a few hundred `explanatory' variables which emphasizes transparency, simplicity, and protection against spurious results. The methods proposed are largely non-parametric, and they are systematically rounded-off by the Benjamini-Hochberg method of multiple testing. An application to the detection of associations between risk factors of heart disease and genetic polymorphisms using the REGRESS dataset provides ample illustration of our approach. Special attention is paid to book-keeping and information-management aspects of data analysis, which allow the creation of an informative and reasonably digestible `map of relationships'---the end-product of an association study as far as statistics is concerned.

[1]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[2]  J. Jukema,et al.  Common variants of multiple genes that control reverse cholesterol transport together explain only a minor part of the variation of HDL cholesterol levels , 2006, Clinical genetics.

[3]  M. Caulfield,et al.  Effects of torcetrapib in patients at high risk for coronary events , 2008 .

[4]  B. Lindqvist,et al.  Estimating the proportion of true null hypotheses, with application to DNA microarray data , 2005 .

[5]  A. Zwinderman,et al.  −455G/A Polymorphism of the β-Fibrinogen Gene is Associated With the Progression of Coronary Atherosclerosis in Symptomatic Men , 1998 .

[6]  Martin Farrall,et al.  Genetic susceptibility to coronary artery disease: from promise to progress , 2006, Nature Reviews Genetics.

[7]  Walter C Willett,et al.  Balancing Life-Style and Genomics Research for Disease Prevention , 2002, Science.

[8]  M. Graffar [Modern epidemiology]. , 1971, Bruxelles medical.

[9]  Kyung In Kim,et al.  Effects of dependence in high-dimensional multiple testing problems , 2008, BMC Bioinformatics.

[10]  S. Yusuf,et al.  Relationship of the ApoE polymorphism to plasma lipid traits among South Asians, Chinese, and Europeans living in Canada. , 2009, Atherosclerosis.

[11]  Jonathan Rees,et al.  Complex disease and the new clinical sciences. , 2002, Science.

[12]  S. Abou-Raya,et al.  Chronic Inflammatory Autoimmune Disorders and Atherosclerosis , 2007, Annals of the New York Academy of Sciences.

[13]  M. Stephens,et al.  K-Sample Anderson–Darling Tests , 1987 .

[14]  Simon C. Potter,et al.  Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls , 2007, Nature.

[15]  J. Gulcher,et al.  The gene encoding 5-lipoxygenase activating protein confers risk of myocardial infarction and stroke , 2004, Nature Genetics.

[16]  J. Opitz,et al.  The genetics of quantitative plasma Lp(a): analysis of a large pedigree. , 1983, American journal of medical genetics.

[17]  J. A. Ferreira,et al.  The International Journal of Biostatistics Approximate Power and Sample Size Calculations with the Benjamini-Hochberg Method , 2011 .

[18]  I. Helland Simple Counterexamples against the Conditionality Principle , 1995 .

[19]  G J Boerma,et al.  Effects of lipid lowering by pravastatin on progression and regression of coronary artery disease in symptomatic men with normal to moderately elevated serum cholesterol levels. The Regression Growth Evaluation Statin Study (REGRESS). , 1995, Circulation.

[20]  W. Kraus,et al.  Genetics of coronary heart disease: current knowledge and research principles. , 2000, American heart journal.

[21]  D. Yekutieli Hierarchical False Discovery Rate–Controlling Methodology , 2008 .

[22]  B. Dahlöf Management of cardiovascular risk with RAS inhibitor/CCB combination therapy , 2009, Journal of Human Hypertension.

[23]  D. Rao,et al.  Twin study of genetic and environmental effects on lipid levels , 1988, Genetic epidemiology.

[24]  A. Zwinderman,et al.  Haplotype analysis of the CETP gene: not TaqIB, but the closely linked -629C-->A polymorphism and a novel promoter variant are independently associated with CETP concentration. , 2003, Human molecular genetics.

[25]  K. Rothman Epidemiology: An Introduction , 2002 .

[26]  Yoav Benjamini,et al.  Approaches to multiplicity issues in complex research in microarray analysis , 2006 .

[27]  John D. Storey,et al.  Strong control, conservative point estimation and simultaneous conservative consistency of false discovery rates: a unified approach , 2004 .

[28]  R. Lawn,et al.  Multiple RFLPs at the human cholesteryl ester transfer protein (CETP) locus. , 1987, Nucleic acids research.

[29]  Sandrine Dudoit,et al.  Resampling-based empirical Bayes multiple testing procedures for controlling generalized tail probability and expected value error rates: focus on the false discovery rate and simulation study. , 2008, Biometrical journal. Biometrische Zeitschrift.

[30]  L. Almasy,et al.  Genome-wide linkage analysis for identifying quantitative trait loci involved in the regulation of lipoprotein a (Lpa) levels , 2008, European Journal of Human Genetics.

[31]  R. D'Agostino,et al.  A genome-wide association study for blood lipid phenotypes in the Framingham Heart Study , 2007, BMC Medical Genetics.

[32]  Joseph P. Romano,et al.  Control of the false discovery rate under dependence using the bootstrap and subsampling , 2008 .

[33]  Iftikhar J Kullo,et al.  Mechanisms of Disease: the genetic basis of coronary heart disease , 2007, Nature Clinical Practice Cardiovascular Medicine.

[34]  S. Yamashita,et al.  Molecular biology and pathophysiological aspects of plasma cholesteryl ester transfer protein. , 2000, Biochimica et biophysica acta.

[35]  Y. Benjamini,et al.  On the Adaptive Control of the False Discovery Rate in Multiple Testing With Independent Statistics , 2000 .

[36]  John D. Storey,et al.  Statistical significance for genomewide studies , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[37]  Sander Greenland,et al.  ASSESSING THE UNRELIABILITY OF THE MEDICAL LITERATURE: A RESPONSE TO "WHY MOST PUBLISHED RESEARCH FINDINGS ARE FALSE" , 2007 .

[38]  H. Ulmer,et al.  High‐density lipoprotein cholesterol, C‐reactive protein, and prevalence and severity of coronary artery disease in 5641 consecutive patients undergoing coronary angiography , 2008, European journal of clinical investigation.

[39]  J. Brooks Why most published research findings are false: Ioannidis JP, Department of Hygiene and Epidemiology, University of Ioannina School of Medicine, Ioannina, Greece , 2008 .

[40]  Vilmundur Gudnason,et al.  A variant of the gene encoding leukotriene A4 hydrolase confers ethnicity-specific risk of myocardial infarction , 2006, Nature Genetics.

[41]  A. Tall,et al.  Plasma lipid transfer proteins, high-density lipoproteins, and reverse cholesterol transport. , 1998, Annual review of nutrition.