Full Title: Evolution is in the details: Regulatory differences in modern human and Neanderthal Short title: Regulatory differences in modern human and Neanderthal

Transcription factor (TF) proteins play a critical role in the regulation of eukaryote gene expression by sequence-specific binding to genomic locations known as transcription factor binding sites. Here we present the TFBSFootprinter tool which has been created to combine transcription-relevant data from six large empirical datasets: Ensembl, JASPAR, FANTOM5, ENCODE, GTEX, and GTRD to more accurately predict functional sites. A complete analysis integrating all experimental datasets can be performed on genes in the human genome, and a limited analysis can be done on a total of 125 vertebrate species. As a use-case, we have used TFBSFootprinter to study sites of genomic variation between modern human and Neanderthal promoters. We found significant differences in binding affinity for 110 transcription factors, which are enriched for homeobox and brain. Analysis of single cell data show that a subset of these (CUX1, CUX2, ESRRG, FOXP1, FOXP2, MEF2C, POU6F2, PRRX1 and RORA) co-occur as marker genes in L4 glutamatergic neurons. Differential binding sites for these transcription factors were found in 74 target genes, the largest number of which were found in the bidirectional promoter of key mitochondrial-function genes FARS2 and LYRM4.

[1]  Michael F. Green,et al.  Mapping genomic loci implicates genes and synaptic biology in schizophrenia , 2022, Nature.

[2]  E. Fombonne,et al.  Global prevalence of autism: A systematic review update , 2022, Autism research : official journal of the International Society for Autism Research.

[3]  L. S. Churchman,et al.  Balanced mitochondrial and cytosolic translatomes underlie the biogenesis of human respiratory complexes , 2021, bioRxiv.

[4]  A. Munnich,et al.  Novel FARS2 variants in patients with early onset encephalopathy with or without epilepsy associated with long survival , 2020, European Journal of Human Genetics.

[5]  G. Hill Mitonuclear Compensatory Coevolution. , 2020, Trends in genetics : TIG.

[6]  S. Pääbo,et al.  Reduced purine biosynthesis in humans after their divergence from Neandertals , 2020, bioRxiv.

[7]  V. Muñoz,et al.  Eukaryotic transcription factors can track and control their target genes using DNA antennas , 2020, Nature Communications.

[8]  A. Fatemi,et al.  Mitochondrial aminoacyl-tRNA synthetase disorders: an emerging group of developmental disorders of myelination , 2019, Journal of Neurodevelopmental Disorders.

[9]  A. Sebé-Pedrós,et al.  Origin and evolution of eukaryotic transcription factors. , 2019, Current opinion in genetics & development.

[10]  M. Tobler,et al.  Mitochondria and the origin of species: bridging genetic and ecological perspectives on speciation processes. , 2019, Integrative and comparative biology.

[11]  Allan R. Jones,et al.  Conserved cell types with divergent features in human versus mouse cortex , 2019, Nature.

[12]  Johannes L. Schönberger,et al.  SciPy 1.0: fundamental algorithms for scientific computing in Python , 2019, Nature Methods.

[13]  Fabian J Theis,et al.  Current best practices in single‐cell RNA‐seq analysis: a tutorial , 2019, Molecular systems biology.

[14]  A. Gómez‐Robles Dental evolutionary rates and its implications for the Neanderthal–modern human divergence , 2019, Science Advances.

[15]  M. Wang,et al.  Targeted resequencing of 358 candidate genes for autism spectrum disorder in a Chinese cohort reveals diagnostic potential and genotype–phenotype correlations , 2019, Human Mutation.

[16]  Matthew W. Mosconi,et al.  Large-Scale Exome Sequencing Study Implicates Both Developmental and Functional Changes in the Neurobiology of Autism , 2019, Cell.

[17]  S. Pääbo,et al.  Limits of long-term selection against Neandertal introgression , 2019, Proceedings of the National Academy of Sciences.

[18]  Helen E. Parkinson,et al.  The NHGRI-EBI GWAS Catalog of published genome-wide association studies, targeted arrays and summary statistics 2019 , 2018, Nucleic Acids Res..

[19]  Anushya Muruganujan,et al.  PANTHER version 14: more genomes, a new PANTHER GO-slim and improvements in enrichment analysis tools , 2018, Nucleic Acids Res..

[20]  J. Greenbaum,et al.  Impact of Genetic Polymorphisms on Human Immune Cell Gene Expression , 2018, Cell.

[21]  James G. Scott,et al.  Global Epidemiology and Burden of Schizophrenia: Findings From the Global Burden of Disease Study 2016 , 2018, Schizophrenia bulletin.

[22]  Sohini Ramachandran,et al.  No Evidence for Recent Selection at FOXP2 among Diverse Human Populations , 2018, Cell.

[23]  Han Liang,et al.  Fast-Evolving Human-Specific Neural Enhancers Are Associated with Aging-Related Diseases. , 2018, Cell systems.

[24]  P. Holland,et al.  Reconstruction of the ancestral metazoan genome reveals an increase in genomic novelty , 2018, Nature Communications.

[25]  Marcel H. Schulz,et al.  Identification of transcription factor binding sites using ATAC-seq , 2018, Genome Biology.

[26]  Abdullah M. Khamis,et al.  A novel method for improved accuracy of transcription factor binding site prediction , 2018, Nucleic acids research.

[27]  Janet Kelso,et al.  Reconstructing the Genetic History of Late Neandertals , 2018, Nature.

[28]  Fabian J Theis,et al.  SCANPY: large-scale single-cell gene expression data analysis , 2018, Genome Biology.

[29]  Adam G Diehl,et al.  Conserved and species-specific transcription factor co-binding patterns drive divergent gene regulation in human and mouse , 2018, Nucleic acids research.

[30]  G. Wagner,et al.  The mammalian decidual cell evolved from a cellular stress response , 2018, bioRxiv.

[31]  P. Gunz,et al.  The evolution of modern human brain shape , 2018, Science Advances.

[32]  David J. Arenillas,et al.  JASPAR 2018: update of the open-access database of transcription factor binding profiles and its web framework , 2017, Nucleic acids research.

[33]  Benoît Ballester,et al.  ReMap 2018: an updated atlas of regulatory regions from an integrative analysis of DNA-binding ChIP-seq experiments , 2017, Nucleic Acids Res..

[34]  E. Eichler,et al.  A high-coverage Neandertal genome from Vindija Cave in Croatia , 2017, Science.

[35]  R. Mann,et al.  Genome-wide prediction of minor-groove electrostatic potential enables biophysical modeling of protein–DNA binding , 2017, Nucleic acids research.

[36]  E. Westhof,et al.  Recent Advances in Mitochondrial Aminoacyl-tRNA Synthetases and Disease. , 2017, Trends in molecular medicine.

[37]  E. Zuckerkandl Perspectives in Molecular Anthropology , 2017 .

[38]  Sheng Liu,et al.  Assessing the model transferability for prediction of transcription factor binding sites based on chromatin accessibility , 2017, BMC Bioinformatics.

[39]  Michael P. Snyder,et al.  Discovery of Novel Human Gene Regulatory Modules from Gene Co-expression and Promoter Motif Analysis , 2017, Scientific Reports.

[40]  N. Šestan,et al.  Evolution of the Human Nervous System Function, Structure, and Development , 2017, Cell.

[41]  W. Wong,et al.  Modeling gene regulation from paired expression and chromatin accessibility data , 2017, Proceedings of the National Academy of Sciences.

[42]  Erica Y. Shen,et al.  MEF2C TRANSCRIPTION FACTOR IS ASSOCIATED WITH THE GENETIC AND EPIGENETIC RISK ARCHITECTURE OF SCHIZOPHRENIA AND IMPROVES COGNITION IN MICE , 2016, Molecular Psychiatry.

[43]  Alexander E. Kel,et al.  GTRD: a database of transcription factor binding sites identified by ChIP-seq experiments , 2016, Nucleic Acids Res..

[44]  Núria Queralt-Rosinach,et al.  DisGeNET: a comprehensive platform integrating information on human disease-associated genes and variants , 2016, Nucleic Acids Res..

[45]  Soher Balkhy,et al.  Mutations in Human Accelerated Regions Disrupt Cognition and Social Behavior , 2016, Cell.

[46]  Allan R. Jones,et al.  Comprehensive cellular‐resolution atlas of the adult human brain , 2016, The Journal of comparative neurology.

[47]  E. Koonin Viruses and mobile elements as drivers of evolutionary transitions , 2016, Philosophical Transactions of the Royal Society B: Biological Sciences.

[48]  Hiroshi Takahashi,et al.  Foxp2 Controls Synaptic Wiring of Corticostriatal Circuits and Vocal Communication by Opposing Mef2C , 2016, Nature Neuroscience.

[49]  S. Kelly,et al.  The Stepwise Increase in the Number of Transcription Factor Families in the Precambrian Predated the Diversification of Plants On Land. , 2016, Molecular biology and evolution.

[50]  W. Wasserman,et al.  Evaluating the impact of single nucleotide variants on transcription factor binding , 2016, Nucleic acids research.

[51]  D. Ben-shachar,et al.  Mitochondrial Oxidative Phosphorylation System (OXPHOS) Deficits in Schizophrenia , 2016, Canadian journal of psychiatry. Revue canadienne de psychiatrie.

[52]  G. Hill Mitonuclear coevolution as the genesis of speciation and the mitochondrial DNA barcode gap , 2016, Ecology and evolution.

[53]  L. S. Churchman,et al.  Synchronized translation programs across compartments during mitochondrial biogenesis , 2016, Nature.

[54]  Andrew D. Rouillard,et al.  Enrichr: a comprehensive gene set enrichment analysis web server 2016 update , 2016, Nucleic Acids Res..

[55]  M. Gerstein,et al.  A uniform survey of allele-specific binding and expression over 1000-Genomes-Project individuals , 2016, Nature Communications.

[56]  C. Bustamante,et al.  The Divergence of Neandertal and Modern Human Y Chromosomes , 2016, American journal of human genetics.

[57]  C. Stringer,et al.  Ontogeny of the maxilla in Neanderthals and their ancestors , 2015, Nature Communications.

[58]  Eugene I. Drigalenko,et al.  Transcriptome outlier analysis implicates schizophrenia susceptibility genes and enriches putatively functional rare genetic variants. , 2015, Human molecular genetics.

[59]  T. Meitinger,et al.  Spectrum of combined respiratory chain defects , 2015, Journal of Inherited Metabolic Disease.

[60]  R. Mann,et al.  Quantitative modeling of transcription factor binding specificities using DNA shape , 2015, Proceedings of the National Academy of Sciences.

[61]  S. Aerts,et al.  Discovery of Transcription Factors and Regulatory Regions Driving In Vivo Tumor Development by ATAC-seq and FAIRE-seq Open Chromatin Profiling , 2015, PLoS genetics.

[62]  Lin Yang,et al.  GBshape: a genome browser database for DNA shape annotations , 2014, Nucleic Acids Res..

[63]  C. Spencer,et al.  Biological Insights From 108 Schizophrenia-Associated Genetic Loci , 2014, Nature.

[64]  M. Jacomy,et al.  ForceAtlas2, a Continuous Graph Layout Algorithm for Handy Network Visualization Designed for the Gephi Software , 2014, PloS one.

[65]  Michael F. Siebauer,et al.  Patterns of coding variation in the complete exomes of three Neandertals , 2014, Proceedings of the National Academy of Sciences.

[66]  Martin S. Taylor,et al.  A promoter-level mammalian expression atlas , 2014, Nature.

[67]  Tatsunori B. Hashimoto,et al.  Discovery of non-directional and directional pioneer transcription factors by modeling DNase profile magnitude and shape , 2014, Nature Biotechnology.

[68]  Philip L. F. Johnson,et al.  The complete genome sequence of a Neandertal from the Altai Mountains , 2013, Nature.

[69]  V. Mootha,et al.  Mutations in LYRM4, encoding iron-sulfur cluster biogenesis factor ISD11, cause deficiency of multiple respiratory chain complexes. , 2013, Human molecular genetics.

[70]  Jonathan K. Pritchard,et al.  The Functional Consequences of Variation in Transcription Factor Binding , 2013, PLoS genetics.

[71]  Howard Y. Chang,et al.  Transposition of native chromatin for fast and sensitive epigenomic profiling of open chromatin, DNA-binding proteins and nucleosome position , 2013, Nature Methods.

[72]  Wyeth W. Wasserman,et al.  The Next Generation of Transcription Factor Binding Site Prediction , 2013, PLoS Comput. Biol..

[73]  Lin Yang,et al.  DNAshape: a method for the high-throughput prediction of DNA structural features on a genomic scale , 2013, Nucleic Acids Res..

[74]  Cory Y. McLean,et al.  PRISM offers a comprehensive genomic approach to transcription factor function prediction , 2013, Genome research.

[75]  A. Battaglia,et al.  6p25 interstitial deletion in two dizygotic twins with gyral pattern anomaly and speech and language disorder. , 2013, European journal of paediatric neurology : EJPN : official journal of the European Paediatric Neurology Society.

[76]  Ellen T. Gelfand,et al.  The Genotype-Tissue Expression (GTEx) project , 2013, Nature Genetics.

[77]  Ezekiel J. Maier,et al.  Mapping functional transcription factor networks from gene expression data , 2013, Genome research.

[78]  Avi Ma'ayan,et al.  Enrichr: interactive and collaborative HTML5 gene list enrichment analysis tool , 2013, BMC Bioinformatics.

[79]  H. Burbano,et al.  A recent evolutionary change affects a regulatory element in the human FOXP2 gene. , 2013, Molecular biology and evolution.

[80]  Andreas Heger,et al.  Epigenetic conservation at gene regulatory elements revealed by non-methylated DNA profiling in seven vertebrates , 2013, eLife.

[81]  Marni J. Falk,et al.  MEF2C Haploinsufficiency features consistent hyperkinesis, variable epilepsy, and has a role in dorsal and ventral neuronal developmental pathways , 2013, neurogenetics.

[82]  Atina G. Coté,et al.  Evaluation of methods for modeling transcription factor sequence specificity , 2013, Nature Biotechnology.

[83]  Juan M. Vaquerizas,et al.  DNA-Binding Specificities of Human Transcription Factors , 2013, Cell.

[84]  P. Khaitovich,et al.  Human brain evolution: transcripts, metabolites and their regulators , 2013, Nature Reviews Neuroscience.

[85]  A. McRae,et al.  Promoter polymorphisms in two overlapping 6p25 genes implicate mitochondrial proteins in cognitive deficit in schizophrenia , 2012, Molecular Psychiatry.

[86]  Melissa C. Greven,et al.  Factorbook.org: a Wiki-based database for transcription factor-binding data generated by the ENCODE consortium , 2012, Nucleic Acids Res..

[87]  Adrian W. Briggs,et al.  A High-Coverage Genome Sequence from an Archaic Denisovan Individual , 2012, Science.

[88]  Daniel J. Miller,et al.  Prolonged myelination in human neocortical evolution , 2012, Proceedings of the National Academy of Sciences.

[89]  Shane J. Neph,et al.  Systematic Localization of Common Disease-Associated Variation in Regulatory DNA , 2012, Science.

[90]  J. Os,et al.  An updated and conservative systematic review and meta-analysis of epidemiological evidence on psychotic experiences in children and adults: on the pathway from proneness to persistence to dimensional expression across mental disorders , 2012, Psychological Medicine.

[91]  G. Stormo,et al.  Improved Models for Transcription Factor Binding Site Identification Using Nonindependent Interactions , 2012, Genetics.

[92]  Stephen C. J. Parker,et al.  Extensive Evolutionary Changes in Regulatory Element Activity during Human Origins Are Associated with Altered Gene Expression and Positive Selection , 2012, PLoS genetics.

[93]  S. Pääbo,et al.  Extension of cortical synaptic development distinguishes humans from chimpanzees and macaques. , 2012, Genome research.

[94]  Vincent J. Lynch,et al.  Transposon-mediated rewiring of gene regulatory networks contributed to the evolution of pregnancy in mammals , 2011, Nature Genetics.

[95]  Amos Tanay,et al.  Primate CpG Islands Are Maintained by Heterogeneous Evolutionary Regimes Involving Minimal Selection , 2011, Cell.

[96]  Gabriel Kreiman,et al.  Conservation of transcription factor binding events predicts gene expression across species , 2011, Nucleic acids research.

[97]  G. Stormo,et al.  Quantitative analysis demonstrates most transcription factors require only simple models of specificity , 2011, Nature Biotechnology.

[98]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[99]  A. Martelli,et al.  Mammalian Frataxin: An Essential Function for Cellular Viability through an Interaction with a Preformed ISCU/NFS1/ISD11 Iron-Sulfur Assembly Complex , 2011, PloS one.

[100]  Philip L. F. Johnson,et al.  Genetic history of an archaic hominin group from Denisova Cave in Siberia , 2010, Nature.

[101]  Tanya M. Smith,et al.  Dental evidence for ontogenetic differences between modern humans and Neanderthals , 2010, Proceedings of the National Academy of Sciences.

[102]  P. Gunz,et al.  Brain development after birth differs between Neanderthals and modern humans , 2010, Current Biology.

[103]  Steven J. M. Jones,et al.  A regulatory toolbox of MiniPromoters to drive selective expression in the brain , 2010, Proceedings of the National Academy of Sciences.

[104]  P. Schimmel,et al.  New functions of aminoacyl-tRNA synthetases beyond translation , 2010, Nature Reviews Molecular Cell Biology.

[105]  Chris Stringer,et al.  Using genetic evidence to evaluate four palaeoanthropological hypotheses for the timing of Neanderthal and modern human origins. , 2010, Journal of human evolution.

[106]  P. Stankiewicz,et al.  Severe mental retardation, seizures, and hypotonia due to deletions of MEF2C , 2010, American journal of medical genetics. Part B, Neuropsychiatric genetics : the official publication of the International Society of Psychiatric Genetics.

[107]  R. Mann,et al.  Origins of specificity in protein-DNA recognition. , 2010, Annual review of biochemistry.

[108]  Philip L. F. Johnson,et al.  A Draft Sequence of the Neandertal Genome , 2010, Science.

[109]  R. Siddharthan Dinucleotide Weight Matrices for Predicting Transcription Factor Binding Sites: Generalizing the Position Weight Matrix , 2010, PloS one.

[110]  Tiffany Williams Genomics offers new possibilities for global health through international collaboration , 2010, Disease Models & Mechanisms.

[111]  T. Rouault,et al.  Human ISD11 is essential for both iron-sulfur cluster assembly and maintenance of normal cellular iron homeostasis. , 2009, Human molecular genetics.

[112]  R. Lill Function and biogenesis of iron–sulphur proteins , 2009, Nature.

[113]  Daniel E. Newburger,et al.  Diversity and Complexity in DNA Recognition by Transcription Factors , 2009, Science.

[114]  B. Nickel,et al.  Transcriptional neoteny in the human brain , 2009, Proceedings of the National Academy of Sciences.

[115]  Kenta Nakai,et al.  Pseudocounts for transcription factor binding sites , 2008, Nucleic acids research.

[116]  Chris Mungall,et al.  AmiGO: online access to ontology and annotation data , 2008, Bioinform..

[117]  E. Birney,et al.  Enredo and Pecan: genome-wide mammalian consistency-based multiple alignment with paralogs. , 2008, Genome research.

[118]  Eytan Domany,et al.  Positional distribution of human transcription factor binding sites , 2008, Nucleic acids research.

[119]  Clifford A. Meyer,et al.  Model-based Analysis of ChIP-Seq (MACS) , 2008, Genome Biology.

[120]  Wen-Hsiung Li,et al.  Fast evolution of core promoters in primate genomes. , 2008, Molecular biology and evolution.

[121]  B. Degnan,et al.  Genesis and expansion of metazoan transcription factor gene classes. , 2008, Molecular biology and evolution.

[122]  Obi L. Griffith,et al.  ORegAnno: an open-access community-driven resource for regulatory annotation , 2007, Nucleic Acids Res..

[123]  Magdalena I. Swanson,et al.  PAZAR: a framework for collection and dissemination of cis-regulatory sequence annotation , 2007, Genome Biology.

[124]  Yixue Li,et al.  An approach to predict transcription factor DNA binding site specificity based upon gene and transcription factor functional categorization , 2007, Bioinform..

[125]  John D. Hunter,et al.  Matplotlib: A 2D Graphics Environment , 2007, Computing in Science & Engineering.

[126]  M. Yamagishi,et al.  Nucleotide Frequencies in Human Genome and Fibonacci Numbers , 2006, Bulletin of mathematical biology.

[127]  A. Philippakis,et al.  Compact, universal DNA microarrays to comprehensively determine transcription-factor binding site specificities , 2006, Nature Biotechnology.

[128]  Enrique Blanco,et al.  ABS: a database of Annotated regulatory Binding Sites from orthologous promoters , 2005, Nucleic Acids Res..

[129]  S. Batzoglou,et al.  Distribution and intensity of constraint in mammalian genomic sequence. , 2005, Genome research.

[130]  G. Rubin,et al.  Exploiting transcription factor binding site clustering to identify cis-regulatory modules involved in pattern formation in the Drosophila genome , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[131]  A. Monaco,et al.  A forkhead-domain gene is mutated in a severe speech and language disorder , 2001, Nature.

[132]  T. D. Schneider,et al.  Use of the 'Perceptron' algorithm to distinguish translational initiation sites in E. coli. , 1982, Nucleic acids research.

[133]  M. King,et al.  Evolution at two levels in humans and chimpanzees. , 1975, Science.

[134]  B. Wood,et al.  Evolution of the modern human brain. , 2019, Progress in brain research.

[135]  M. Biggin,et al.  High-throughput SELEX determination of DNA sequences bound by transcription factors in vitro. , 2012, Methods in molecular biology.

[136]  Jacob F. Degner,et al.  Sequence and Chromatin Accessibility Data Accurate Inference of Transcription Factor Binding from Dna Material Supplemental Open Access , 2022 .

[137]  Shuxiang Ruan,et al.  Digital Commons@Becker , 2022 .