Large upward bias in estimation of locus-specific effects from genomewide scans.

The primary goal of a genomewide scan is to estimate the genomic locations of genes influencing a trait of interest. It is sometimes said that a secondary goal is to estimate the phenotypic effects of each identified locus. Here, it is shown that these two objectives cannot be met reliably by use of a single data set of a currently realistic size. Simulation and analytical results, based on variance-components linkage analysis as an example, demonstrate that estimates of locus-specific effect size at genomewide LOD score peaks tend to be grossly inflated and can even be virtually independent of the true effect size, even for studies on large samples when the true effect size is small. However, the bias diminishes asymptotically. The explanation for the bias is that the LOD score is a function of the locus-specific effect-size estimate, such that there is a high correlation between the observed statistical significance and the effect-size estimate. When the LOD score is maximized over the many pointwise tests being conducted throughout the genome, the locus-specific effect-size estimate is therefore effectively maximized as well. We argue that attempts at bias correction give unsatisfactory results, and that pointwise estimation in an independent data set may be the only way of obtaining reliable estimates of locus-specific effect-and then only if one does not condition on statistical significance being obtained. We further show that the same factors causing this bias are responsible for frequent failures to replicate initial claims of linkage or association for complex traits, even when the initial localization is, in fact, correct. The findings of this study have wide-ranging implications, as they apply to all statistical methods of gene localization. It is hoped that, by keeping this bias in mind, we will more realistically interpret and extrapolate from the results of genomewide scans.

[1]  C. Schön,et al.  Bias and Sampling Error of the Estimated Proportion of Genotypic Variance Explained by Quantitative Trait Loci Determined From Experimental Data in Maize Using Cross Validation and Validation With Independent Samples. , 2000, Genetics.

[2]  R. Fisher THE EFFECT OF METHODS OF ASCERTAINMENT UPON THE ESTIMATION OF FREQUENCIES , 1934 .

[3]  D E Weeks,et al.  True and false positive peaks in genomewide scans: applications of length-biased sampling to linkage mapping. , 1997, American journal of human genetics.

[4]  Alan J. Miller Subset Selection in Regression , 1992 .

[5]  M. Georges,et al.  Mapping quantitative trait loci controlling milk production in dairy cattle by exploiting progeny testing. , 1995, Genetics.

[6]  N. Schork,et al.  Testing the robustness of the likelihood-ratio test in a variance-component quantitative-trait loci-mapping procedure. , 1999, American journal of human genetics.

[7]  J Blangero,et al.  Statistical properties of a variance components method for quantitative trait linkage analysis in nuclear families and extended pedigrees , 1997, Genetic epidemiology.

[8]  M. Kendall,et al.  Kendall's advanced theory of statistics , 1995 .

[9]  H H Göring,et al.  Linkage analysis in the presence of errors IV: joint pseudomarker analysis of linkage and/or linkage disequilibrium on a mixture of pedigrees and singletons when the mode of inheritance cannot be accurately specified. , 2000, American journal of human genetics.

[10]  H H Göring,et al.  Linkage analysis in the presence of errors I: complex-valued recombination fractions and complex phenotypes. , 2000, American journal of human genetics.

[11]  Alan J. Miller,et al.  Subset Selection in Regression , 1991 .

[12]  C. Amos Robust variance-components approach for assessing genetic linkage in pedigrees. , 1994, American journal of human genetics.

[13]  Elliot S. Gershon,et al.  Genetic approaches to mental disorders , 1994 .

[14]  K. Lange,et al.  Extensions to pedigree analysis III. Variance components by the scoring method , 1976, Annals of human genetics.

[15]  K. Liang,et al.  Asymptotic Properties of Maximum Likelihood Estimators and Likelihood Ratio Tests under Nonstandard Conditions , 1987 .

[16]  D. Falconer,et al.  Introduction to Quantitative Genetics. , 1962 .

[17]  J Siemiatycki,et al.  The problem of multiple inference in studies designed to generate hypotheses. , 1985, American journal of epidemiology.

[18]  J Blangero,et al.  Power of variance component linkage analysis to detect quantitative trait loci. , 1999, Annals of human genetics.

[19]  K. Lange,et al.  Ascertainment and goodness of fit of variance component models for pedigree data. , 1984, Progress in clinical and biological research.

[20]  K. Weiss,et al.  How many diseases does it take to map a gene with SNPs? , 2000, Nature Genetics.

[21]  D. Rao,et al.  Robust inference for variance components models in families ascertained through probands: I. Conditioning on proband's phenotype , 1987, Genetic epidemiology.

[22]  Andrew H. Paterson,et al.  Molecular Dissection of Complex Traits , 1997 .

[23]  R. Lande,et al.  Efficiency of marker-assisted selection in the improvement of quantitative traits. , 1990, Genetics.

[24]  M. Kimura,et al.  An introduction to population genetics theory , 1971 .

[25]  N. Morton Sequential tests for the detection of linkage. , 1955, American journal of human genetics.

[26]  M. Kearsey,et al.  QTL analysis in plants; where are we now? , 1998, Heredity.

[27]  A. Melchinger,et al.  Quantitative trait locus (QTL) mapping using different testers and independent population samples in maize reveals low power of QTL detection and large bias in estimates of QTL effects. , 1998, Genetics.

[28]  J. Mathews,et al.  Extensions to multivariate normal models for pedigree analysis , 1982, Annals of human genetics.

[29]  L Rushton,et al.  Simultaneous inference in epidemiological studies. , 1982, International journal of epidemiology.

[30]  Jeremy MG Taylor,et al.  Robust Statistical Modeling Using the t Distribution , 1989 .

[31]  L. Almasy,et al.  Variance component methods for detecting complex trait loci. , 2001, Advances in genetics.

[32]  D. Goldgar Multipoint analysis of human quantitative genetic variation. , 1990, American journal of human genetics.

[33]  R. Wette,et al.  Multifactorial analysis of family data ascertained through truncation: a comparative evaluation of two methods of statistical inference. , 1988, American journal of human genetics.

[34]  N Risch,et al.  The Future of Genetic Studies of Complex Human Diseases , 1996, Science.

[35]  A. Ehrenberg,et al.  The Design of Replicated Studies , 1993 .

[36]  M. Boehnke,et al.  The effects of conditioning on probands to correct for multiple ascertainment. , 1984, American journal of human genetics.

[37]  H H Göring,et al.  Linkage analysis in the presence of errors II: marker-locus genotyping errors modeled with hypercomplex recombination fractions. , 2000, American journal of human genetics.

[38]  J. Faraway On the Cost of Data Analysis , 1992 .

[39]  E. Lander,et al.  Genetic dissection of complex traits: guidelines for interpreting and reporting linkage results , 1995, Nature Genetics.

[40]  L. Almasy,et al.  Quantitative trait locus mapping using human pedigrees. , 2000, Human biology.

[41]  A. Comuzzie,et al.  Correcting for ascertainment bias in the COGA data set , 1999, Genetic epidemiology.

[42]  L. Almasy,et al.  Multipoint quantitative-trait linkage analysis in general pedigrees. , 1998, American journal of human genetics.