Identification of cis-regulatory sequence variations in individual genome sequences

Functional contributions of cis-regulatory sequence variations to human genetic disease are numerous. For instance, disrupting variations in a HNF4A transcription factor binding site upstream of the Factor IX gene contributes causally to hemophilia B Leyden. Although clinical genome sequence analysis currently focuses on the identification of protein-altering variation, the impact of cis-regulatory mutations can be similarly strong. New technologies are now enabling genome sequencing beyond exomes, revealing variation across the non-coding 98% of the genome responsible for developmental and physiological patterns of gene activity. The capacity to identify causal regulatory mutations is improving, but predicting functional changes in regulatory DNA sequences remains a great challenge. Here we explore the existing methods and software for prediction of functional variation situated in the cis-regulatory sequences governing gene transcription and RNA processing.

[1]  M. Hersberger,et al.  The c.–292C>T promoter polymorphism increases reticulocyte-type 15-lipoxygenase-1 activity and could be atheroprotective , 2007, Clinical chemistry and laboratory medicine.

[2]  Joshua M. Korn,et al.  Mapping and sequencing of structural variation from eight human genomes , 2008, Nature.

[3]  David J. Arenillas,et al.  JASPAR 2010: the greatly expanded open-access database of transcription factor binding profiles , 2009, Nucleic Acids Res..

[4]  N. Ahituv,et al.  cis‐regulatory mutations are a genetic cause of human limb malformations , 2011, Developmental dynamics : an official publication of the American Association of Anatomists.

[5]  George A. Calin,et al.  Mammalian microRNAs: a small world for fine-tuning gene expression , 2006, Mammalian Genome.

[6]  Dustin E. Schones,et al.  Global analysis of the insulator binding protein CTCF in chromatin barrier regions reveals demarcation of active and repressive domains. , 2008, Genome research.

[7]  M. Nóbrega,et al.  An 8q24 gene desert variant associated with prostate cancer risk confers differential in vivo activity to a MYC enhancer. , 2010, Genome research.

[8]  Harri Lähdesmäki,et al.  Systematic Analysis of Disease-Related Regulatory Mutation Classes Reveals Distinct Effects on Transcription Factor Binding , 2009, Silico Biol..

[9]  T. Bailey,et al.  High-throughput chromatin information enables accurate tissue-specific prediction of transcription factor binding sites , 2008, Nucleic acids research.

[10]  Ehud Shapiro,et al.  De novo DNA synthesis using single molecule PCR , 2008, Nucleic acids research.

[11]  D. Fitzpatrick,et al.  Long-range regulation at the SOX9 locus in development and disease , 2009, Journal of Medical Genetics.

[12]  Joshua S. Paul,et al.  Genotype and SNP calling from next-generation sequencing data , 2011, Nature Reviews Genetics.

[13]  C. Burge,et al.  Splicing regulation: from a parts list of regulatory elements to an integrated splicing code. , 2008, RNA.

[14]  P. Whorwell,et al.  First evidence for an association of a functional variant in the microRNA-510 target site of the serotonin receptor-type 3E gene with diarrhea predominant irritable bowel syndrome. , 2008, Human molecular genetics.

[15]  William Stafford Noble,et al.  Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project , 2007, Nature.

[16]  Elizabeth M. Smigielski,et al.  dbSNP: the NCBI database of genetic variation , 2001, Nucleic Acids Res..

[17]  A. Krainer,et al.  Listening to silence and understanding nonsense: exonic mutations that affect splicing , 2002, Nature Reviews Genetics.

[18]  I. Amit,et al.  Comprehensive mapping of long range interactions reveals folding principles of the human genome , 2011 .

[19]  P. Reitsma,et al.  Disruption of a binding site for hepatocyte nuclear factor 4 results in hemophilia B Leyden. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[20]  G. Wray,et al.  Abundant raw material for cis-regulatory evolution in humans. , 2002, Molecular biology and evolution.

[21]  M. Berger,et al.  Universal protein-binding microarrays for the comprehensive characterization of the DNA-binding specificities of transcription factors , 2009, Nature Protocols.

[22]  J. Shendure,et al.  Massively parallel sequencing and rare disease. , 2010, Human molecular genetics.

[23]  V. Orlando,et al.  Mapping chromosomal proteins in vivo by formaldehyde-crosslinked-chromatin immunoprecipitation. , 2000, Trends in biochemical sciences.

[24]  T. Pippucci,et al.  FA2H‐related disorders: a novel c.270+3A>T splice‐site mutation leads to a complex neurodegenerative phenotype , 2011, Developmental medicine and child neurology.

[25]  Martha L. Bulyk,et al.  UniPROBE, update 2011: expanded content and search tools in the online database of protein-binding microarray data on protein–DNA interactions , 2010, Nucleic Acids Res..

[26]  Junjun Zhang,et al.  BioMart Central Portal—unified access to biological data , 2009, Nucleic Acids Res..

[27]  D Lindhout,et al.  The genetic basis of the reduced expression of bilirubin UDP-glucuronosyltransferase 1 in Gilbert's syndrome. , 1995, The New England journal of medicine.

[28]  A. Nekrutenko,et al.  Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences , 2010, Genome Biology.

[29]  Timothy J. Durham,et al.  "Systematic" , 1966, Comput. J..

[30]  F. Nielsen,et al.  Disruption of a novel regulatory element in the erythroid-specific promoter of the human PKLR gene causes severe pyruvate kinase deficiency. , 2003, Blood.

[31]  Michael D. Wilson,et al.  Five-Vertebrate ChIP-seq Reveals the Evolutionary Dynamics of Transcription Factor Binding , 2010, Science.

[32]  Sharon R Grossman,et al.  Integrating common and rare genetic variation in diverse human populations , 2010, Nature.

[33]  Michael C O'Donovan,et al.  Strong bias in the location of functional promoter polymorphisms , 2005, Human mutation.

[34]  James Bailey,et al.  is-rSNP: a novel technique for in silico regulatory SNP detection , 2010, Bioinform..

[35]  A. Lecharny,et al.  TC-motifs at the TATA-box expected position in plant genes: a novel class of motifs involved in the transcription regulation , 2010, BMC Genomics.

[36]  A novel mutation in the nerve‐specific 5′UTR of the GJB1 gene causes X‐linked Charcot‐Marie‐Tooth disease , 2011, Journal of the peripheral nervous system : JPNS.

[37]  Alexander G. Churbanov,et al.  A method of predicting changes in human gene splicing induced by genetic variants in context of cis-acting elements , 2010, BMC Bioinformatics.

[38]  S. Henikoff,et al.  Predicting the effects of coding non-synonymous variants on protein function using the SIFT algorithm , 2009, Nature Protocols.

[39]  D. Weinberger,et al.  MicroSNiPer: a web tool for prediction of SNP effects on putative microRNA targets , 2010, Human mutation.

[40]  G. Stormo,et al.  Quantitative analysis demonstrates most transcription factors require only simple models of specificity , 2011, Nature Biotechnology.

[41]  Michelle R. Campbell,et al.  Discovery and verification of functional single nucleotide polymorphisms in regulatory genomic regions: current and developing technologies. , 2008, Mutation research.

[42]  F. P. Roth,et al.  A non-parametric model for transcription factor binding sites. , 2003, Nucleic acids research.

[43]  Daniel E. Newburger,et al.  Diversity and Complexity in DNA Recognition by Transcription Factors , 2009, Science.

[44]  Jacques van Helden,et al.  RSAT: regulatory sequence analysis tools , 2008, Nucleic Acids Res..

[45]  Daniel Rios,et al.  Ensembl 2011 , 2010, Nucleic Acids Res..

[46]  Nan Shen,et al.  A Functional Variant in MicroRNA-146a Promoter Modulates Its Expression and Confers Disease Risk for Systemic Lupus Erythematosus , 2011, PLoS genetics.

[47]  Edgar Wingender,et al.  The TRANSFAC project as an example of framework technology that supports the analysis of genomic regulation , 2008, Briefings Bioinform..

[48]  Heng Li,et al.  A survey of sequence alignment algorithms for next-generation sequencing , 2010, Briefings Bioinform..

[49]  P. Bork,et al.  A method and server for predicting damaging missense mutations , 2010, Nature Methods.

[50]  Stephen A. Ramsey,et al.  Genome-wide histone acetylation data improve prediction of mammalian transcription factor binding sites , 2010, Bioinform..

[51]  A. Sandelin,et al.  Identification of conserved regulatory elements by comparative genome analysis , 2003, Journal of biology.

[52]  A. Visel,et al.  ChIP-seq accurately predicts tissue-specific activity of enhancers , 2009, Nature.

[53]  Guey-Shin Wang,et al.  Splicing in disease: disruption of the splicing code and the decoding machinery , 2007, Nature Reviews Genetics.

[54]  Life Technologies,et al.  A map of human genome variation from population-scale sequencing , 2011 .

[55]  P. Park ChIP–seq: advantages and challenges of a maturing technology , 2009, Nature Reviews Genetics.

[56]  A. Visel,et al.  Response to Comment on "Human-Specific Gain of Function in a Developmental Enhancer" , 2009, Science.

[57]  K. Pollard,et al.  Detection of nonneutral substitution rates on mammalian phylogenies. , 2010, Genome research.

[58]  Obi L. Griffith,et al.  ORegAnno: an open-access community-driven resource for regulatory annotation , 2007, Nucleic Acids Res..

[59]  Philip Machanick,et al.  MEME-ChIP: motif analysis of large DNA datasets , 2011, Bioinform..

[60]  A. Sandelin,et al.  Applied bioinformatics for the identification of regulatory elements , 2004, Nature Reviews Genetics.

[61]  M. Daly,et al.  Genome-wide mapping of DNase hypersensitive sites using massively parallel signature sequencing (MPSS). , 2005, Genome research.

[62]  David B Goldstein,et al.  Screening the human exome: a comparison of whole genome and whole transcriptome sequencing , 2010, Genome Biology.

[63]  A. Dean In the loop: long range chromatin interactions and gene regulation. , 2011, Briefings in functional genomics.

[64]  M. Lipinski,et al.  Chromosome conformation capture (from 3C to 5C) and its ChIP-based modification. , 2009, Methods in molecular biology.

[65]  Nathaniel D. Heintzman,et al.  9p21 DNA variants associated with Coronary Artery Disease impair IFNγ signaling response , 2011, Nature.

[66]  Elaine H. Zackai,et al.  Identification of a Mutation in a GATA Binding Site of the Platelet Glycoprotein Ibβ Promoter Resulting in the Bernard-Soulier Syndrome* , 1996, The Journal of Biological Chemistry.

[67]  James Bailey,et al.  is-rSNP: a novel technique for in silico regulatory SNP detection , 2010, BMC Bioinformatics.

[68]  J. Weber,et al.  A founder mutation in the Ashkenazi Jewish population affecting messenger RNA splicing of the CCM2 gene causes cerebral cavernous malformations , 2011, Genetics in Medicine.

[69]  G. Wray The evolutionary significance of cis-regulatory mutations , 2007, Nature Reviews Genetics.

[70]  W. Miller,et al.  Finding cis-regulatory elements using comparative genomics: some lessons from ENCODE data. , 2007, Genome research.

[71]  Wyeth W. Wasserman,et al.  TFBS: Computational framework for transcription factor binding site analysis , 2002, Bioinform..

[72]  B. Lenhard,et al.  Exonic remnants of whole-genome duplication reveal cis-regulatory function of coding exons , 2009, Nucleic acids research.

[73]  W. W. Solinge,et al.  Management of gene promoter mutations in molecular diagnostics. , 2009 .

[74]  Uwe Ohler,et al.  A paired-end sequencing strategy to map the complex landscape of transcription initiation , 2010, Nature Methods.

[75]  Magdalena I. Swanson,et al.  PAZAR: a framework for collection and dissemination of cis-regulatory sequence annotation , 2007, Genome Biology.

[76]  L. Chasin,et al.  Searching for splicing motifs. , 2007, Advances in experimental medicine and biology.

[77]  Margaret A. Pericak-Vance,et al.  Exome Sequencing of a Multigenerational Human Pedigree , 2009, PloS one.

[78]  M. DePristo,et al.  A framework for variation discovery and genotyping using next-generation DNA sequencing data , 2011, Nature Genetics.

[79]  L. Cupples,et al.  A common polymorphism decreases low-density lipoprotein receptor exon 12 splicing efficiency and associates with increased cholesterol. , 2007, Human molecular genetics.

[80]  Volker Brendel,et al.  MetWAMer: eukaryotic translation initiation site prediction , 2008, BMC Bioinformatics.

[81]  David J. Arenillas,et al.  In Silico Detection of Sequence Variations Modifying Transcriptional Regulation , 2007, PLoS Comput. Biol..

[82]  David J. Arenillas,et al.  The PAZAR database of gene regulatory information coupled to the ORCA toolkit for the study of regulatory sequences , 2008, Nucleic Acids Res..

[83]  G. Felsenfeld,et al.  Insulators: exploiting transcriptional and epigenetic mechanisms , 2006, Nature Reviews Genetics.

[84]  B. Bernstein,et al.  Charting histone modifications and the functional organization of mammalian genomes , 2011, Nature Reviews Genetics.

[85]  Toshihiro Tanaka,et al.  Regulatory polymorphism in transcription factor KLF5 at the MEF2 element alters the response to angiotensin II and is associated with human hypertension , 2010, FASEB journal : official publication of the Federation of American Societies for Experimental Biology.

[86]  D. Altshuler,et al.  A map of human genome variation from population-scale sequencing , 2010, Nature.

[87]  S. Quake,et al.  A Systems Approach to Measuring the Binding Energy Landscapes of Transcription Factors , 2007, Science.