A stochastic expectation and maximization algorithm for detecting quantitative trait-associated genes

MOTIVATION Most biological traits may be correlated with the underlying gene expression patterns that are partially determined by DNA sequence variation. The correlations between gene expressions and quantitative traits are essential for understanding the functions of genes and dissecting gene regulatory networks. RESULTS In the present study, we adopted a novel statistical method, called the stochastic expectation and maximization (SEM) algorithm, to analyze the associations between gene expression levels and quantitative trait values and identify genetic loci controlling the gene expression variations. In the first step, gene expression levels measured from microarray experiments were assigned to two different clusters based on the strengths of their association with the phenotypes of a quantitative trait under investigation. In the second step, genes associated with the trait were mapped to genetic loci of the genome. Because gene expressions are quantitative, the genetic loci controlling the expression traits are called expression quantitative trait loci. We applied the same SEM algorithm to a real dataset collected from a barley genetic experiment with both quantitative traits and gene expression traits. For the first time, we identified genes associated with eight agronomy traits of barley. These genes were then mapped to seven chromosomes of the barley genome. The SEM algorithm and the result of the barley data analysis are useful to scientists in the areas of bioinformatics and plant breeding. AVAILABILITY AND IMPLEMENTATION The R program for the SEM algorithm can be downloaded from our website: http://www.statgen.ucr.edu.

[1]  Shizhong Xu,et al.  Mapping Quantitative Trait Loci for Expression Abundance , 2007, Genetics.

[2]  Pierre R. Bushel,et al.  Assessing Gene Significance from cDNA Microarray Expression Data via Mixed Models , 2001, J. Comput. Biol..

[3]  S. Knapp,et al.  Quantitative trait locus effects and environmental interaction in a sample of North American barley germ plasm , 1993, Theoretical and Applied Genetics.

[4]  L. Liang,et al.  Mapping complex disease traits with global gene expression , 2009, Nature Reviews Genetics.

[5]  R. Waugh,et al.  SFP Genotyping From Affymetrix Arrays Is Robust But Largely Detects Cis-acting Expression Regulators , 2007, Genetics.

[6]  Shizhong Xu,et al.  Quantitative trait associated microarray gene expression data analysis. , 2006, Molecular biology and evolution.

[7]  W. Markesbery,et al.  Incipient Alzheimer's disease: Microarray correlation analyses reveal major transcriptional and tumor suppressor responses , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[8]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[9]  Rafael A Irizarry,et al.  Exploration, normalization, and summaries of high density oligonucleotide array probe level data. , 2003, Biostatistics.

[10]  Lorenz Wernisch,et al.  Analysis of whole-genome microarray replicates using mixed models , 2003, Bioinform..

[11]  X. Cui,et al.  Improved statistical tests for differential gene expression by shrinking variance components estimates. , 2005, Biostatistics.

[12]  Partha S. Vasisht Computational Analysis of Microarray Data , 2003 .

[13]  A. Graner,et al.  Functional association between malting quality trait components and cDNA array based expression patterns in barley (Hordeum vulgare L.) , 2004, Molecular Breeding.

[14]  C. Kendziorski,et al.  Statistical Methods for Expression Quantitative Trait Loci (eQTL) Mapping , 2006, Biometrics.

[15]  David BotsteinS’B Mapping Mendelian Factors Underlying Quantitative Traits Using RFLP Linkage Maps , 2002 .

[16]  C. Molony,et al.  Genetic analysis of genome-wide variation in human gene expression , 2004, Nature.

[17]  S. Horvath,et al.  A family-based test for correlation between gene expression and trait values. , 2003, American journal of human genetics.

[18]  Shizhong Xu,et al.  Clustering expressed genes on the basis of their association with a quantitative phenotype. , 2005, Genetical research.

[19]  Gary A. Churchill,et al.  Analysis of Variance for Gene Expression Microarray Data , 2000, J. Comput. Biol..