Handling Permutation in Sequence Comparison: Genome-Wide Enhancer Prediction in Vertebrates by a Novel Non-Linear Alignment Scoring Principle

Enhancers have been described to evolve by permutation without changing function. This has posed the problem of how to predict enhancer elements that are hidden from alignment-based approaches due to the loss of co-linearity. Alignment-free algorithms have been proposed as one possible solution. However, this approach is hampered by several problems inherent to its underlying working principle. Here we present a new approach, which combines the power of alignment and alignment-free techniques into one algorithm. It allows the prediction of enhancers based on the query and target sequence only, no matter whether the regulatory logic is co-linear or reshuffled. To test our novel approach, we employ it for the prediction of enhancers across the evolutionary distance of ~450Myr between human and medaka. We demonstrate its efficacy by subsequent in vivo validation resulting in 82% (9/11) of the predicted medaka regions showing reporter activity. These include five candidates with partially co-linear and four with reshuffled motif patterns. Orthology in flanking genes and conservation of the detected co-linear motifs indicates that those candidates are likely functionally equivalent enhancers. In sum, our results demonstrate that the proposed principle successfully predicts mutated as well as permuted enhancer regions at an encouragingly high rate.

[1]  David J. Arenillas,et al.  JASPAR 2014: an extensively expanded and updated open-access database of transcription factor binding profiles , 2013, Nucleic Acids Res..

[2]  Kevin Y. Yip,et al.  Classification of human genomic regions based on experimentally determined binding sites of more than 100 transcription-related factors , 2012, Genome Biology.

[3]  Raymond K. Auerbach,et al.  An Integrated Encyclopedia of DNA Elements in the Human Genome , 2012, Nature.

[4]  Alex A. Pollen,et al.  The genomic basis of adaptive evolution in threespine sticklebacks , 2012, Nature.

[5]  S. Barolo,et al.  Rapid Evolutionary Rewiring of a Structurally Constrained Eye Enhancer , 2011, Current Biology.

[6]  W. Miller,et al.  Genome-wide identification of conserved regulatory function in diverged sequences. , 2011, Genome research.

[7]  K. Dewar,et al.  Combining Computational Prediction of Cis-Regulatory Elements with a New Enhancer Assay to Efficiently Label Neuronal Structures in the Medaka Fish , 2011, PloS one.

[8]  Steven M. Gallo,et al.  REDfly v3.0: toward a comprehensive database of transcriptional regulatory elements in Drosophila , 2010, Nucleic Acids Res..

[9]  Jeffrey H. Chuang,et al.  The importance of being cis: evolution of orthologous fish and mammalian enhancer activity. , 2010, Molecular biology and evolution.

[10]  Pavel Tomancak,et al.  An alignment-free method to identify candidate orthologous enhancers in multiple Drosophila genomes , 2010, Bioinform..

[11]  S. Barolo,et al.  Structural rules and complex regulatory circuitry constrain expression of a Notch- and EGFR-regulated eye enhancer. , 2010, Developmental cell.

[12]  Jeremy Schmutz,et al.  Adaptive Evolution of Pelvic Reduction in Sticklebacks by Recurrent Deletion of a Pitx1 Enhancer , 2010, Science.

[13]  Ning Ma,et al.  BLAST+: architecture and applications , 2009, BMC Bioinformatics.

[14]  Diego Miranda-Saavedra,et al.  Motif-blind, genome-wide discovery of cis-regulatory modules in Drosophila and mouse. , 2009, Developmental cell.

[15]  G. Wray,et al.  Evolution of a malaria resistance gene in wild primates , 2009, Nature.

[16]  Xin He,et al.  Alignment and Prediction of cis-Regulatory Modules Based on a Probabilistic Model of Evolution , 2009, PLoS Comput. Biol..

[17]  Ivan Ovcharenko,et al.  Widespread Ultraconservation Divergence in Primates , 2008, Molecular biology and evolution.

[18]  A. Visel,et al.  Ultraconservation identifies a small subset of extremely constrained developmental enhancers , 2008, Nature Genetics.

[19]  Saurabh Sinha,et al.  A statistical method for alignment-free comparison of regulatory sequences , 2007, ISMB/ECCB.

[20]  S. Harrison,et al.  An Atomic Model of the Interferon-β Enhanceosome , 2007, Cell.

[21]  D. Gifford,et al.  Tissue-specific transcriptional regulation has diverged significantly between human and mouse , 2007, Nature Genetics.

[22]  Andrea Califano,et al.  Discovering transcriptional regulatory regions in Drosophila by a nonalignment method for phylogenetic footprinting , 2007, Proceedings of the National Academy of Sciences.

[23]  L. Pease,et al.  Gene splicing and mutagenesis by PCR-driven overlap extension , 2007, Nature Protocols.

[24]  Alan M. Moses,et al.  In vivo enhancer analysis of human conserved non-coding sequences , 2006, Nature.

[25]  P. Simpson,et al.  Two or Four Bristles: Functional Evolution of an Enhancer of scute in Drosophilidae , 2006, PLoS biology.

[26]  Axel Visel,et al.  Enhancer identification through comparative genomics. , 2006, Seminars in cell & developmental biology.

[27]  J. Wittbrodt,et al.  Transgenesis in fish: efficient selection of transgenic fish by co-injection with a fluorescent reporter construct , 2006, Nature Protocols.

[28]  Inna Dubchak,et al.  VISTA Enhancer Browser—a database of tissue-specific human enhancers , 2006, Nucleic Acids Res..

[29]  F. Robert,et al.  Genome-wide computational prediction of transcriptional regulatory modules reveals new insights into human gene expression , 2006 .

[30]  S. Carroll,et al.  Repeated morphological evolution through cis-regulatory changes in a pleiotropic gene , 2006, Nature.

[31]  E. Ukkonen,et al.  Genome-wide Prediction of Mammalian Enhancers Based on Analysis of Transcription-Factor Binding Affinity , 2006, Cell.

[32]  D. Kibler,et al.  Using hexamers to predict cis-regulatory motifs in Drosophila , 2005, BMC Bioinformatics.

[33]  D. Haussler,et al.  Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. , 2005, Genome research.

[34]  Anthony A. Philippakis,et al.  ModuleFinder: A Tool for Computational Discovery of Cis Regulatory Modules , 2004, Pacific Symposium on Biocomputing.

[35]  M. L. Howard,et al.  Erratum to “cis-Regulatory control circuits in development” [Dev. Biol. 271 (2004) 109–118] , 2004 .

[36]  M. L. Howard,et al.  cis-Regulatory control circuits in development. , 2004, Developmental biology.

[37]  D. Haussler,et al.  Ultraconserved Elements in the Human Genome , 2004, Science.

[38]  D. Haussler,et al.  Aligning multiple genomic sequences with the threaded blockset aligner. , 2004, Genome research.

[39]  Jacques van Helden,et al.  Metrics for comparing regulatory sequences on the basis of pattern counts , 2004, Bioinform..

[40]  M. Nóbrega,et al.  Scanning Human Gene Deserts for Long-Range Enhancers , 2003, Science.

[41]  Gregory A. Wray,et al.  Conservation of Endo16 expression in sea urchins despite evolutionary divergence in both cis and trans-acting components of transcriptional regulation , 2003, Development.

[42]  Jonas S. Almeida,et al.  Alignment-free sequence comparison-a review , 2003, Bioinform..

[43]  J. Joly,et al.  I-SceI meganuclease mediates highly efficient transgenesis in fish , 2002, Mechanisms of Development.

[44]  Michael Brudno,et al.  Fast and sensitive alignment of large genomic sequences , 2002, Proceedings. IEEE Computer Society Bioinformatics Conference.

[45]  L. Pennacchio,et al.  Genomic strategies to identify mammalian regulatory sequences , 2001, Nature Reviews Genetics.

[46]  N. Patel,et al.  Evidence for stabilizing selection in a eukaryotic enhancer element , 2000, Nature.

[47]  J. Wittbrodt,et al.  Medaka spalt acts as a target gene of hedgehog signaling. , 1997, Development.

[48]  F. Jacob Evolution and tinkering. , 1977, Science.

[49]  M. King,et al.  Evolution at two levels in humans and chimpanzees. , 1975, Science.

[50]  Melissa C. Greven,et al.  An integrated encyclopedia of DNA elements in the human genome , 2014 .

[51]  P. Muti,et al.  RIPK1/RIPK3 promotes vascular permeability to allow tumor cell extravasation independent of its necroptotic function , 2017, Cell Death & Disease.

[52]  Robert S. Harris,et al.  Improved pairwise alignment of genomic dna , 2007 .

[53]  Esko Ukkonen,et al.  Locating potential enhancer elements by comparative genomics using the EEL software , 2006, Nature Protocols.

[54]  Steve Baker,et al.  Integrated gene and species phylogenies from unaligned whole genome protein sequences , 2002, Bioinform..

[55]  Winston Hide,et al.  Biological Evaluation of d2, an Algorithm for High-Performance Sequence Comparison , 1994, J. Comput. Biol..

[56]  Christopher D. Brown,et al.  Supporting Online Material Materials and Methods Figs. S1 to S6 Table S1 References Functional Architecture and Evolution of Transcriptional Elements That Drive Gene Coexpression , 2022 .