Genomic features that predict allelic imbalance in humans suggest patterns of constraint on gene expression variation.

Variation in gene expression is an important contributor to phenotypic diversity within and between species. Although this variation often has a genetic component, identification of the genetic variants driving this relationship remains challenging. In particular, measurements of gene expression usually do not reveal whether the genetic basis for any observed variation lies in cis or in trans to the gene, a distinction that has direct relevance to the physical location of the underlying genetic variant, and which may also impact its evolutionary trajectory. Allelic imbalance measurements identify cis-acting genetic effects by assaying the relative contribution of the two alleles of a cis-regulatory region to gene expression within individuals. Identification of patterns that predict commonly imbalanced genes could therefore serve as a useful tool and also shed light on the evolution of cis-regulatory variation itself. Here, we show that sequence motifs, polymorphism levels, and divergence levels around a gene can be used to predict commonly imbalanced genes in a human data set. Reduction of this feature set to four factors revealed that only one factor significantly differentiated between commonly imbalanced and nonimbalanced genes. We demonstrate that these results are consistent between the original data set and a second published data set in humans obtained using different technical and statistical methods. Finally, we show that variation in the single allelic imbalance-associated factor is partially explained by the density of genes in the region of a target gene (allelic imbalance is less probable for genes in gene-dense regions), and, to a lesser extent, the evenness of expression of the gene across tissues and the magnitude of negative selection on putative regulatory regions of the gene. These results suggest that the genomic distribution of functional cis-regulatory variants in the human genome is nonrandom, perhaps due to local differences in evolutionary constraint.

[1]  Mayo Roettger,et al.  A machine-learning approach reveals that alignment properties alone can accurately predict inference of lateral gene transfer from discordant phylogenies. , 2009, Molecular biology and evolution.

[2]  G. Wray,et al.  Evolution of a malaria resistance gene in wild primates , 2009, Nature.

[3]  A. Cecile J.W. Janssens,et al.  Eye color and the prediction of complex phenotypes from genotypes , 2009, Current Biology.

[4]  A. Long,et al.  Cis-regulatory Variation Is Typically Polyallelic in Drosophila , 2009, Genetics.

[5]  Daniel E. Weeks,et al.  Interpretation of Genetic Association Studies: Markers with Replicated Highly Significant Odds Ratios May Be Poor Classifiers , 2009, PLoS genetics.

[6]  Lorenz Wernisch,et al.  Estimating Translational Selection in Eukaryotic Genomes , 2008, Molecular biology and evolution.

[7]  K. Dewar,et al.  Targeted screening of cis-regulatory variation in human haplotypes. , 2008, Genome research.

[8]  Mark I McCarthy,et al.  Type 2 diabetes: new genes, new understanding. , 2008, Trends in genetics : TIG.

[9]  Mark I. McCarthy,et al.  Assessing the Combined Impact of 18 Common Genetic Variants of Modest Effect Sizes on Type 2 Diabetes Risk , 2008, Diabetes.

[10]  A. Cecile J.W. Janssens,et al.  Predicting Type 2 Diabetes Based on Polymorphisms From Genome-Wide Association Studies , 2008, Diabetes.

[11]  M. Khoury,et al.  Public Health Genomics Approach to Type 2 Diabetes , 2008, Diabetes.

[12]  Alicia Oshlack,et al.  Gene Regulation in Primates Evolves under Tissue-Specific Selection Pressures , 2008, PLoS genetics.

[13]  Sang Hong Lee,et al.  Predicting Unobserved Phenotypes for Complex Traits from Whole-Genome SNP Data , 2008, PLoS genetics.

[14]  D. Hartl,et al.  Dominance and the evolutionary accumulation of cis- and trans-effects on gene expression , 2008, Proceedings of the National Academy of Sciences.

[15]  R. Nielsen,et al.  Patterns of Positive Selection in Six Mammalian Genomes , 2008, PLoS genetics.

[16]  Karthik Devarajan,et al.  Nonnegative Matrix Factorization: An Analytical and Interpretive Tool in Computational Biology , 2008, PLoS Comput. Biol..

[17]  Joshua T. Burdick,et al.  Monozygotic twins reveal germline contribution to allelic expression differences. , 2008, American Journal of Human Genetics.

[18]  Catarina D. Campbell,et al.  A survey of allelic imbalance in F1 mice. , 2008, Genome research.

[19]  L. Kruglyak,et al.  Gene–Environment Interaction in Yeast Gene Expression , 2008, PLoS biology.

[20]  Sean B. Carroll,et al.  The Evolution of Gene Regulation Underlies a Morphological Difference between Two Drosophila Sister Species , 2008, Cell.

[21]  A. Clark,et al.  Regulatory changes underlying expression differences within and between Drosophila species , 2008, Nature Genetics.

[22]  Thomas J. Hudson,et al.  Differential Allelic Expression in the Human Genome: A Robust Approach To Identify Genetic and Epigenetic Cis-Acting Mechanisms Regulating Gene Expression , 2008, PLoS genetics.

[23]  A. Long,et al.  cis-Regulatory Variation is Typically Poly-Allelic in Drosophila RUNNING HEAD Poly-Allelic cis-Regulatory Variation , 2008 .

[24]  D. Geerts,et al.  Domain-wide regulation of gene expression in the human genome. , 2007, Genome research.

[25]  G. Wray,et al.  Promoter regions of many neural- and nutrition-related genes have experienced positive selection during human evolution , 2007, Nature Genetics.

[26]  G. Wray The evolutionary significance of cis-regulatory mutations , 2007, Nature Reviews Genetics.

[27]  L. Milani,et al.  Allelic imbalance in gene expression as a guide to cis-acting regulatory single nucleotide polymorphisms in cancer cells , 2007, Nucleic acids research.

[28]  Holly M. Mortensen,et al.  Convergent adaptation of human lactase persistence in Africa and Europe , 2007, Nature Genetics.

[29]  T. Mitchell-Olds,et al.  Cis-regulatory Evolution of Chalcone-Synthase Expression in the Genus Arabidopsis , 2006, Genetics.

[30]  J. Altmann,et al.  Ancient polymorphism and functional variation in the primate MHC-DQA1 5′ cis-regulatory region , 2006, Proceedings of the National Academy of Sciences.

[31]  J. Odeberg,et al.  Allele-specific MMP-3 transcription under in vivo conditions. , 2006, Biochemical and biophysical research communications.

[32]  Thorsten Joachims,et al.  Training linear SVMs in linear time , 2006, KDD '06.

[33]  Scott A. Rifkin,et al.  Natural selection on gene expression. , 2006, Trends in genetics : TIG.

[34]  Sayan Mukherjee,et al.  Evidence of Influence of Genomic DNA Sequence on Human X Chromosome Inactivation , 2006, PLoS Comput. Biol..

[35]  J. Drake,et al.  Modelling ecological niches with support vector machines , 2006 .

[36]  D. Cox,et al.  Allele-Specific KRT1 Expression Is a Complex Trait , 2006, PLoS genetics.

[37]  S. Carroll,et al.  Repeated morphological evolution through cis-regulatory changes in a pleiotropic gene , 2006, Nature.

[38]  Terence P. Speed,et al.  Expression profiling in primates reveals a rapid evolution of human transcription factors , 2006, Nature.

[39]  D. Cox,et al.  Analysis of allelic differential expression in human white blood cells. , 2006, Genome research.

[40]  C. Ober,et al.  Evidence of balancing selection at the HLA-G promoter region. , 2005, Human molecular genetics.

[41]  Joshua T. Burdick,et al.  Mapping determinants of human gene expression by regional and genome-wide association , 2005, Nature.

[42]  S. Pääbo,et al.  Parallel Patterns of Evolution in the Genomes and Transcriptomes of Humans and Chimpanzees , 2005, Science.

[43]  Thorsten Joachims,et al.  A support vector method for multivariate performance measures , 2005, ICML.

[44]  A. Hartemink,et al.  Genome-wide prediction of imprinted murine genes. , 2005, Genome research.

[45]  M. Lynch,et al.  The transcriptional consequences of mutation and natural selection in Caenorhabditis elegans , 2005, Nature Genetics.

[46]  Jeremy Schmutz,et al.  Widespread Parallel Evolution in Sticklebacks by Repeated Fixation of Ectodysplasin Alleles , 2005, Science.

[47]  T. Mitchell-Olds,et al.  Allele-Specific Assay Reveals Functional Variation in the Chalcone Synthase Promoter of Arabidopsis thaliana That Is Compatible with Neutral Evolutionw⃞ , 2005, The Plant Cell Online.

[48]  Maggi Kelly,et al.  Support vector machines for predicting distribution of Sudden Oak Death in California , 2005 .

[49]  S. Carroll,et al.  Chance caught on the wing: cis-regulatory evolution and the origin of pigment patterns in Drosophila , 2005, Nature.

[50]  Thomas J. Hudson,et al.  Cis-Acting Regulatory Variation in the Human Genome , 2004, Science.

[51]  C. Molony,et al.  Genetic analysis of genome-wide variation in human gene expression , 2004, Nature.

[52]  Andrew G. Clark,et al.  Evolutionary changes in cis and trans gene regulation , 2004, Nature.

[53]  M. Nóbrega,et al.  Comparative genomics at the vertebrate extremes , 2004, Nature Reviews Genetics.

[54]  S. Batalov,et al.  A gene atlas of the mouse and human protein-encoding transcriptomes. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[55]  D. Schluter,et al.  Genetic and developmental basis of evolutionary pelvic reduction in threespine sticklebacks , 2004, Nature.

[56]  D. Haussler,et al.  Aligning multiple genomic sequences with the threaded blockset aligner. , 2004, Genome research.

[57]  Pablo Tamayo,et al.  Metagenes and molecular pattern discovery using matrix factorization , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[58]  Daniel Sinnett,et al.  A survey of genetic and epigenetic variation affecting human gene expression. , 2004, Physiological genomics.

[59]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[60]  K. Heller,et al.  Sequence information for the splicing of human pre-mRNA identified by support vector machine classification. , 2003, Genome research.

[61]  G. Robinson,et al.  Gene Expression Profiles in the Brain Predict Behavior in Individual Honey Bees , 2003, Science.

[62]  K. Buetow,et al.  Allelic variation in gene expression is common in the human genome. , 2003, Genome research.

[63]  Jason E Stajich,et al.  The effects of selection against spurious transcription factor binding sites. , 2003, Molecular biology and evolution.

[64]  M. Owen,et al.  Cis-acting variation in the expression of a high proportion of genes in human brain , 2003, Human Genetics.

[65]  Scott A. Rifkin,et al.  Evolution of gene expression in the Drosophila melanogaster subgroup , 2003, Nature Genetics.

[66]  E. Lander,et al.  Detection of regulatory variation in mouse genes , 2002, Nature Genetics.

[67]  Bert Vogelstein,et al.  Allelic Variation in Human Gene Expression , 2002, Science.

[68]  Tom H. Pringle,et al.  The human genome browser at UCSC. , 2002, Genome research.

[69]  Martin J. Lercher,et al.  Clustering of housekeeping genes provides a unified model of gene order in the human genome , 2002, Nature Genetics.

[70]  R. Spang,et al.  Predicting the clinical status of human breast cancer by using gene expression profiles , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[71]  M. Nachman,et al.  Estimate of the mutation rate per nucleotide in humans. , 2000, Genetics.

[72]  I. Longden,et al.  EMBOSS: the European Molecular Biology Open Software Suite. , 2000, Trends in genetics : TIG.

[73]  S. Kruglyak,et al.  Regulation of adjacent yeast genes. , 2000, Trends in genetics : TIG.

[74]  D Haussler,et al.  Knowledge-based analysis of microarray gene expression data by using support vector machines. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[75]  Jill P. Mesirov,et al.  Support Vector Machine Classification of Microarray Data , 2001 .

[76]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[77]  J. Herman Health , 1996, Annals of Internal Medicine.

[78]  C. Tournamille,et al.  Disruption of a GATA motif in the Duffy gene promoter abolishes erythroid gene expression in Duffy–negative individuals , 1995, Nature Genetics.

[79]  H. Akashi,et al.  Inferring weak selection from patterns of polymorphism and divergence at "silent" sites in Drosophila DNA. , 1995, Genetics.

[80]  M. King,et al.  Evolution at two levels in humans and chimpanzees. , 1975, Science.