Statistical Methods for Expression Quantitative Trait Loci (eQTL) Mapping

Summary Traditional genetic mapping has largely focused on the identification of loci affecting one, or at most a few, complex traits. Microarrays allow for measurement of thousands of gene expression abundances, themselves complex traits, and a number of recent investigations have considered these measurements as phenotypes in mapping studies. Combining traditional quantitative trait loci (QTL) mapping methods with microarray data is a powerful approach with demonstrated utility in a number of recent biological investigations. These expression quantitative trait loci (eQTL) studies are similar to traditional QTL studies, as a main goal is to identify the genomic locations to which the expression traits are linked. However, eQTL studies probe thousands of expression transcripts; and as a result, standard multi‐trait QTL mapping methods, designed to handle at most tens of traits, do not directly apply. One possible approach is to use single‐trait QTL mapping methods to analyze each transcript separately. This leads to an increased number of false discoveries, as corrections for multiple tests across transcripts are not made. Similarly, the repeated application, at each marker, of methods for identifying differentially expressed transcripts suffers from multiple tests across markers. Here, we demonstrate the deficiencies of these approaches and propose a mixture over markers (MOM) model that shares information across both markers and transcripts. The utility of all methods is evaluated using simulated data as well as data from an F2 mouse cross in a study of diabetes. Results from simulation studies indicate that the MOM model is best at controlling false discoveries, without sacrificing power. The MOM model is also the only one capable of finding two genome regions previously shown to be involved in diabetes.

[1]  W. J. Langford Statistical Methods , 1959, Nature.

[2]  E. Lander,et al.  Mapping mendelian factors underlying quantitative traits using RFLP linkage maps. , 1989, Genetics.

[3]  M. Maffei,et al.  Positional cloning of the mouse obese gene and its human homologue , 1994, Nature.

[4]  R. Doerge,et al.  Empirical threshold values for quantitative trait mapping. , 1994, Genetics.

[5]  J. Naggert,et al.  Hyperproinsulinaemia in obese fat/fat mice associated with a carboxypeptidase E mutation which reduces enzyme activity , 1995, Nature Genetics.

[6]  M. Maffei,et al.  Positional cloning of the mouse obese gene and its human homologue , 1995, Nature.

[7]  T. Louis,et al.  Bayes and Empirical Bayes Methods for Data Analysis. , 1997 .

[8]  Bradley P. Carlin,et al.  BAYES AND EMPIRICAL BAYES METHODS FOR DATA ANALYSIS , 1996, Stat. Comput..

[9]  D Siegmund,et al.  Statistical methods for mapping quantitative trait loci from a dense set of markers. , 1999, Genetics.

[10]  B. Yandell,et al.  Genetic obesity unmasks nonlinear interactions between murine type 2 diabetes susceptibility loci. , 2000, Diabetes.

[11]  W. Andrew LO, . Finance: Survey.. Journal of the American Statistical Association, , . , 2000 .

[12]  Ingrid Lönnstedt Replicated microarray data , 2001 .

[13]  K. Broman,et al.  Review of statistical methods for QTL mapping in experimental crosses. , 2001, Lab animal.

[14]  Pierre Baldi,et al.  A Bayesian framework for the analysis of microarray expression data: regularized t -test and statistical inferences of gene changes , 2001, Bioinform..

[15]  R. Tibshirani,et al.  Significance analysis of microarrays applied to the ionizing radiation response , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[16]  Christina Kendziorski,et al.  On Differential Variability of Expression Ratios: Improving Statistical Inference about Gene Expression Changes from Microarray Data , 2001, J. Comput. Biol..

[17]  John D. Storey,et al.  Empirical Bayes Analysis of a Microarray Experiment , 2001 .

[18]  Karl W. Broman,et al.  A model selection approach for the identification of quantitative trait loci in experimental crosses , 2002 .

[19]  L. Kruglyak,et al.  Genetic Dissection of Transcriptional Regulation in Budding Yeast , 2002, Science.

[20]  R. Tibshirani,et al.  Empirical bayes methods and false discovery rates for microarrays , 2002, Genetic epidemiology.

[21]  S. Dudoit,et al.  STATISTICAL METHODS FOR IDENTIFYING DIFFERENTIALLY EXPRESSED GENES IN REPLICATED cDNA MICROARRAY EXPERIMENTS , 2002 .

[22]  Christina Kendziorski,et al.  Parametric Empirical Bayes Methods for Microarrays , 2003 .

[23]  B. Efron Large-Scale Simultaneous Hypothesis Testing , 2004 .

[24]  Rachel B. Brem,et al.  Trans-acting regulatory variation in Saccharomyces cerevisiae and the role of transcription factors , 2003, Nature Genetics.

[25]  B. Yandell,et al.  Gene expression profiles of nondiabetic and diabetic obese mice suggest a role of hepatic lipogenic capacity in diabetes susceptibility. , 2003, Diabetes.

[26]  C M Kendziorski,et al.  On parametric empirical Bayes methods for comparing multiple groups using replicated gene expression profiles , 2003, Statistics in medicine.

[27]  B. Guldbrandtsen,et al.  Multitrait fine mapping of quantitative trait loci using combined linkage disequilibria and linkage analysis. , 2003, Genetics.

[28]  B. Yandell,et al.  Dimension reduction for mapping mRNA abundance as quantitative traits. , 2003, Genetics.

[29]  R. Stoughton,et al.  Genetics of gene expression surveyed in maize, mouse and man , 2003, Nature.

[30]  Rafael A Irizarry,et al.  Exploration, normalization, and summaries of high density oligonucleotide array probe level data. , 2003, Biostatistics.

[31]  John D. Storey The positive false discovery rate: a Bayesian interpretation and the q-value , 2003 .

[32]  G. Parmigiani,et al.  The Analysis of Gene Expression Data , 2003 .

[33]  John D. Storey,et al.  Statistical significance for genomewide studies , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[34]  E. Schadt,et al.  Genetic loci for diet-induced atherosclerotic lesions and plasma lipids in mice , 2003, Mammalian Genome.

[35]  John D. Storey,et al.  Strong control, conservative point estimation and simultaneous conservative consistency of false discovery rates: a unified approach , 2004 .

[36]  N. Cox Human genetics: An expression of interest , 2004, Nature.

[37]  B. Yandell,et al.  Identification of major quantitative trait loci controlling body weight variation in ob/ob mice. , 2004, Diabetes.

[38]  Canada,et al.  STATISTICAL METHODS FOR , 2004 .

[39]  Deepayan Sarkar,et al.  Detecting differential gene expression with a semiparametric hierarchical mixture method. , 2004, Biostatistics.

[40]  Chunfang Jin,et al.  Selective Phenotyping for Increased Efficiency in Genetic Mapping Studies , 2004, Genetics.

[41]  Expression of interest , 2005, Nature.

[42]  K. Hummel,et al.  The influence of genetic background on the expression of the obese (ob) gene in the mouse , 1973, Diabetologia.

[43]  M. Newton Large-Scale Simultaneous Hypothesis Testing: The Choice of a Null Hypothesis , 2008 .

[44]  Sushant Sachdeva,et al.  Dimension Reduction , 2008, Encyclopedia of GIS.