A multimetric approach to analysis of genome-wide association by single markers and composite likelihood

Two case/control studies with different phenotypes, marker densities, and microarrays were examined for the most significant single markers in defined regions. They show a pronounced bias toward exaggerated significance that increases with the number of observed markers and would increase further with imputed markers. This bias is eliminated by Bonferroni adjustment, thereby allowing combination by principal component analysis with a Malecot model composite likelihood evaluated by a permutation procedure to allow for multiple dependent markers. This intermediate value identifies the only demonstrated causal locus as most significant even in the preliminary analysis and clearly recognizes the strongest candidate in the other sample. Because the three metrics (most significant single marker, composite likelihood, and their principal component) are correlated, choice of the n smallest P values by each test gives <3n regions for follow-up in the next stage. In this way, methods with different response to marker selection and density are given approximately equal weight and economically compared, without expressing an untested prejudice or sacrificing the most significant results for any of them. Large numbers of cases, controls, and markers are by themselves insufficient to control type 1 and 2 errors, and so efficient use of multiple metrics with Bonferroni adjustment promises to be valuable in identifying causal variants and optimal design simultaneously.

[1]  H. Ropers New perspectives for the elucidation of genetic disorders. , 2007, American journal of human genetics.

[2]  P. Donnelly,et al.  A new multipoint method for genome-wide association studies by imputation of genotypes , 2007, Nature Genetics.

[3]  M. Daly,et al.  Guilt beyond a reasonable doubt , 2007, Nature Genetics.

[4]  Simon C. Potter,et al.  Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls , 2007, Nature.

[5]  J. Boughman Genomewide Association Studies Data Sharing: National Institutes of Health Policy Process , 2007 .

[6]  Andrew Collins,et al.  Exploiting large scale computing to construct high resolution linkage disequilibrium maps of the human genome , 2007, Bioinform..

[7]  D. Bentley,et al.  Whole-genome re-sequencing. , 2006, Current opinion in genetics & development.

[8]  Christian Gieger,et al.  A common genetic variant in the NOS1 regulator NOS1AP modulates cardiac repolarization , 2006, Nature Genetics.

[9]  N. Morton,et al.  A map of the human genome in linkage disequilibrium units. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[10]  M. Olivier A haplotype map of the human genome. , 2003, Nature.

[11]  M. Olivier A haplotype map of the human genome , 2003, Nature.

[12]  Warren J. Ewens,et al.  On Estimating P Values by Monte Carlo Methods , 2003 .

[13]  Karl W Broman,et al.  Simulation-based P values: response to North et al. , 2003, American journal of human genetics.

[14]  P. Sham,et al.  A note on the calculation of empirical P values from Monte Carlo procedures. , 2002, American journal of human genetics.

[15]  N. E. Morton,et al.  The first linkage disequilibrium (LD) maps: Delineation of hot and cold blocks by diplotype analysis , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[16]  N Risch,et al.  The Future of Genetic Studies of Complex Human Diseases , 1996, Science.

[17]  W. G. Hill,et al.  Nonuniform recombination within the human beta-globin gene cluster. , 1986, American journal of human genetics.

[18]  K. Buetow,et al.  Nonuniform recombination within the human beta-globin gene cluster. , 1984, American journal of human genetics.

[19]  F. David,et al.  Statistical Estimates and Transformed Beta-Variables. , 1960 .

[20]  E. Tagliaferro [Clinical and electrocardiographic considerations on a group of pulmonary tuberculosis patients treated with a preparation of dihydroxyprophyltheophylline and sodium succinate]. , 1953, Giornale italiano della tubercolosi.

[21]  B. Glass Maupertuis and the Beginnings of Genetics , 1947, The Quarterly Review of Biology.

[22]  R. Fisher,et al.  Statistical Methods for Research Workers , 1930, Nature.

[23]  N. Morton,et al.  Genome scanning by composite likelihood. , 2007, American journal of human genetics.

[24]  E. Carlson Mendel's legacy : the origin of classical genetics , 2004 .

[25]  W. Ewens On estimating P values by the Monte Carlo method. , 2003, American journal of human genetics.

[26]  D Curtis,et al.  A note on calculation of empirical P values from Monte Carlo procedure. , 2003, American Journal of Human Genetics.

[27]  R. Viertl On the Future of Data Analysis , 2002 .

[28]  G. Mendel,et al.  Versuche Uber Pflanzenhybriden , 1960 .

[29]  A. Bernard,et al.  The plotting of observations on probability-paper , 1955 .

[30]  M. Kendall Statistical Methods for Research Workers , 1937, Nature.