Fast calculation of restricted maximum likelihood methods for unstructured high-throughput data

Linear mixed models are often used for analysing unbalanced data with certain missing values in a broad range of applications. The restricted maximum likelihood method is often preferred to estimate co-variance parameters in such models due to its unbiased estimation of the underlying variance parameters. The restricted log-likelihood function involves log determinants of a complicated co-variance matrix which are computational prohibitive. An efficient statistical estimate of the underlying model parameters and quantifying the accuracy of the estimation requires the observed or the Fisher information matrix. Standard approaches to compute the observed and Fisher information matrix are computationally prohibitive. Customized algorithms are of highly demand to keep the restricted log-likelihood method scalable for increasing high-throughput unbalanced data sets. In this paper, we explore how to leverage an information splitting technique and dedicate matrix transform to significantly reduce computations. Together with a fill-in reducing multi-frontal sparse direct solvers, this approach improves performance of the computation process.

[1]  M. Suzuki,et al.  Application of supernodal sparse factorization and inversion to the estimation of (co)variance components by residual maximum likelihood. , 2014, Journal of animal breeding and genetics = Zeitschrift fur Tierzuchtung und Zuchtungsbiologie.

[2]  H. D. Patterson,et al.  Recovery of inter-block information when block sizes are unequal , 1971 .

[3]  M. Stephens,et al.  Genome-wide Efficient Mixed Model Analysis for Association Studies , 2012, Nature Genetics.

[4]  B. Efron,et al.  Assessing the accuracy of the maximum likelihood estimator: Observed versus expected Fisher information , 1978 .

[5]  I Misztal,et al.  Comparison of computing properties of derivative and derivative-free algorithms in variance-component estimation by REML. , 1994, Journal of animal breeding and genetics = Zeitschrift fur Tierzuchtung und Zuchtungsbiologie.

[6]  Xiaowen Xu,et al.  Information Splitting for Big Data Analytics , 2016, 2016 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery (CyberC).

[7]  Robert I. Jennrich,et al.  Newton-Raphson and Related Algorithms for Maximum Likelihood Variance Component Estimation , 1976 .

[8]  Xingping Liu,et al.  Information Matrix Splitting , 2016 .

[9]  N. Longford A FAST SCORING ALGORITHM FOR MAXIMUM LIKELIHOOD ESTIMATION IN UNBALANCED MIXED MODELS WITH NESTED RANDOM EFFECTS , 1987 .

[10]  Zhiwu Zhang,et al.  Mixed linear model approach adapted for genome-wide association studies , 2010, Nature Genetics.

[11]  Eleazar Eskin,et al.  Improved linear mixed models for genome-wide association studies , 2012, Nature Methods.

[12]  Ying Liu,et al.  FaST linear mixed models for genome-wide association studies , 2011, Nature Methods.

[13]  Mollie E. Brooks,et al.  Generalized linear mixed models: a practical guide for ecology and evolution. , 2009, Trends in ecology & evolution.