Estimation of heritability is fundamental in genetic studies. In recent years, heritability estimation using linear mixed models (LMMs) has gained popularity, because these estimates can be obtained from unrelated individuals collected in genome wide association studies. Typically, heritability estimation under LMMs uses either the maximum likelihood (ML) or the restricted maximum likelihood (REML) approach. Existing methods for the construction of confidence intervals and estimators of standard errors for both ML and REML rely on asymptotic properties. However, these assumptions are often violated due to the bounded parameter space, statistical dependencies, and limited sample size, leading to biased estimates, and inflated or deflated confidence intervals. Here, we show that often the probability that the genetic component is estimated as zero is high even when the true heritability is bounded away from zero, emphasizing the need for accurate confidence intervals. We further show that the estimation of confidence intervals by state-of-the-art methods is highly inaccurate, especially when the true heritability is either relatively low or relatively high. Such biases are present, for example, in estimates of heritability of gene expression in the GTEx study, and of lipid profiles in the LURIC study. We propose a computationally efficient method, Accurate LMM-Based confidence I ntervals (ALBI), for the estimation of the distribution of the heritability estimator, and for the construction of accurate confidence intervals. Our method can be used as an add-on to existing methods for heritability and variance components estimation, such as GCTA, FaST-LMM, GEMMA, or EMMA. ALBI is available at http://www.cs.tau.ac.il/~heran/cozygene/software/albi.html. 2 . CC-BY-NC-ND 4.0 International license peer-reviewed) is the author/funder. It is made available under a The copyright holder for this preprint (which was not . http://dx.doi.org/10.1101/031492 doi: bioRxiv preprint first posted online Nov. 24, 2015;
[1]
Eleazar Eskin,et al.
Efficient Multiple-Trait Association and Estimation of Genetic Correlation Using the Matrix-Variate Linear Mixed Model
,
2015,
Genetics.
[2]
Robert L. Wolpert,et al.
Statistical Inference
,
2019,
Encyclopedia of Social Network Analysis and Mining.
[3]
M. Stephens,et al.
Genome-wide Efficient Mixed Model Analysis for Association Studies
,
2012,
Nature Genetics.
[4]
Ying Liu,et al.
FaST linear mixed models for genome-wide association studies
,
2011,
Nature Methods.
[5]
P. Visscher,et al.
GCTA: a tool for genome-wide complex trait analysis.
,
2011,
American journal of human genetics.
[6]
C. Geyer,et al.
Fuzzy and randomized confidence intervals and P-values
,
2005
.
[7]
Larry Wasserman,et al.
All of Statistics: A Concise Course in Statistical Inference
,
2004
.
[8]
Robin Thompson,et al.
Average information REML: An efficient algorithm for variance parameter estimation in linear mixed models
,
1995
.
[9]
K. Liang,et al.
Asymptotic Properties of Maximum Likelihood Estimators and Likelihood Ratio Tests under Nonstandard Conditions
,
1987
.
[10]
D. Harville.
Maximum Likelihood Approaches to Variance Component Estimation and to Related Problems
,
1977
.
[11]
H. D. Patterson,et al.
Recovery of inter-block information when block sizes are unequal
,
1971
.
[12]
H. Hartley,et al.
Maximum-likelihood estimation for the mixed analysis of variance model.
,
1967,
Biometrika.