High-resolution comparative analysis of great ape genomes

A spotlight on great ape genomes Most nonhuman primate genomes generated to date have been “humanized” owing to their many gaps and the reliance on guidance by the reference human genome. To remove this humanizing effect, Kronenberg et al. generated and assembled long-read genomes of a chimpanzee, an orangutan, and two humans and compared them with a previously generated gorilla genome. This analysis recognized genomic structural variation specific to humans and particular ape lineages. Comparisons between human and chimpanzee cerebral organoids showed down-regulation of the expression of specific genes in humans, relative to chimpanzees, related to noncoding variation identified in this analysis. Science, this issue p. eaar6343 Analysis of long-read great ape and human genomes identifies human-specific changes affecting brain gene expression. INTRODUCTION Understanding the genetic differences that make us human is a long-standing endeavor that requires the comprehensive discovery and comparison of all forms of genetic variation within great ape lineages. RATIONALE The varied quality and completeness of ape genomes have limited comparative genetic analyses. To eliminate this contiguity and quality disparity, we generated human and nonhuman ape genome assemblies without the guidance of the human reference genome. These new genome assemblies enable both coarse and fine-scale comparative genomic studies. RESULTS We sequenced and assembled two human, one chimpanzee, and one orangutan genome using high-coverage (>65x) single-molecule, real-time (SMRT) long-read sequencing technology. We also sequenced more than 500,000 full-length complementary DNA samples from induced pluripotent stem cells to construct de novo gene models, increasing our knowledge of transcript diversity in each ape lineage. The new nonhuman ape genome assemblies improve gene annotation and genomic contiguity (by 30- to 500-fold), resulting in the identification of larger synteny blocks (by 22- to 74-fold) when compared to earlier assemblies. Including the latest gorilla genome, we now estimate that 83% of the ape genomes can be compared in a multiple sequence alignment. We observe a modest increase in single-nucleotide variant divergence compared to previous genome analyses and estimate that 36% of human autosomal DNA is subject to incomplete lineage sorting. We fully resolve most common repeat differences, including full-length retrotransposons such as the African ape-specific endogenous retroviral element PtERV1. We show that the spread of this element independently in the gorilla and chimpanzee lineage likely resulted from a founder element that failed to segregate to the human lineage because of incomplete lineage sorting. The improved sequence contiguity allowed a more systematic discovery of structural variation (>50 base pairs in length) (see the figure). We detected 614,186 ape deletions, insertions, and inversions, assigning each to specific ape lineages. Unbiased genome scaffolding (optical maps, bacterial artificial chromosome sequencing, and fluorescence in situ hybridization) led to the discovery of large, unknown complex inversions in gene-rich regions. Of the 17,789 fixed human-specific insertions and deletions, we focus on those of potential functional effect. We identify 90 that are predicted to disrupt genes and an additional 643 that likely affect regulatory regions, more than doubling the number of human-specific deletions that remove regulatory sequence in the human lineage. We investigate the association of structural variation with changes in human-chimpanzee brain gene expression using cerebral organoids as a proxy for expression differences. Genes associated with fixed structural variants (SVs) show a pattern of down-regulation in human radial glial neural progenitors, whereas human-specific duplications are associated with up-regulated genes in human radial glial and excitatory neurons (see the figure). CONCLUSION The improved ape genome assemblies provide the most comprehensive view to date of intermediate-size structural variation and highlight several dozen genes associated with structural variation and brain-expression differences between humans and chimpanzees. These new references will provide a stepping stone for the completion of great ape genomes at a quality commensurate with the human reference genome and, ultimately, an understanding of the genetic differences that make us human. SMRT assemblies and SV analyses. (Top) Contiguity of the de novo assemblies. (Bottom, left to right) For each ape, SVdetection was done against the human reference genome as represented by a dot plot of an inversion). Human-specific SVs, identified by comparing ape SVs and population genotyping (0/0, homozygous reference),were compared to single-cell gene expression differences [range: low (dark blue) to high (dark red)] in primary and organoid tissues. Each heatmap row is a gene that intersects an insertion or deletion (green), duplication (cyan), or inversion (light green). Genetic studies of human evolution require high-quality contiguous ape genome assemblies that are not guided by the human reference. We coupled long-read sequence assembly and full-length complementary DNA sequencing with a multiplatform scaffolding approach to produce ab initio chimpanzee and orangutan genome assemblies. By comparing these with two long-read de novo human genome assemblies and a gorilla genome assembly, we characterized lineage-specific and shared great ape genetic variation ranging from single– to mega–base pair–sized variants. We identified ~17,000 fixed human-specific structural variants identifying genic and putative regulatory changes that have emerged in humans since divergence from nonhuman apes. Interestingly, these variants are enriched near genes that are down-regulated in human compared to chimpanzee cerebral organoids, particularly in cells analogous to radial glial neural progenitors.

[1]  National Health and Medical Research Council (NHMRC) , 2018, The Grants Register 2022.

[2]  David Haussler,et al.  Comparative Annotation Toolkit (CAT)—simultaneous clade and personal genome annotation , 2017, bioRxiv.

[3]  Alex A. Pollen,et al.  Spatiotemporal gene expression trajectories reveal developmental hierarchies of the human cortex , 2017, Science.

[4]  Arcadi Navarro,et al.  A 3-way hybrid approach to generate a new high-quality chimpanzee reference genome (Pan_tro_3.0) , 2017, GigaScience.

[5]  Ryan L. Collins,et al.  Multi-platform discovery of haplotype-resolved structural variation in human genomes , 2017, bioRxiv.

[6]  Jonas Korlach,et al.  De novo PacBio long-read and phased avian genome assemblies correct and add to reference genes generated with intermediate and short reads , 2017, GigaScience.

[7]  Geir Kjetil Sandve,et al.  NucDiff: in-depth characterization and annotation of differences between two sets of DNA sequences , 2017, BMC Bioinformatics.

[8]  A. Pang,et al.  Combination of short-read, long-read, and optical mapping assemblies reveals large-scale tandem repeat arrays with population genetic implications , 2017, Genome research.

[9]  Bernardo J. Clavijo,et al.  Improving and correcting the contiguity of long-read genome assemblies of three plant species using optical mapping and chromosome conformation capture data. , 2017, Genome research.

[10]  Jonas Korlach,et al.  Discovery and genotyping of structural variation from long-read haploid genome sequence data , 2017, Genome research.

[11]  Dingding Han,et al.  Comprehensive transcriptome analysis of neocortical layers in humans, chimpanzees and macaques , 2017, Nature Neuroscience.

[12]  Steven G. Schroeder,et al.  Single-molecule sequencing and chromatin conformation capture enable de novo reference assembly of the domestic goat genome , 2017, Nature Genetics.

[13]  Eleazar Eskin,et al.  Selection in Europeans on Fatty Acid Desaturases Associated with Dietary Changes , 2017, Molecular biology and evolution.

[14]  S. Koren,et al.  Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation , 2016, bioRxiv.

[15]  Feng Gao,et al.  Dietary adaptation of FADS genes in Europe varied across time and geography , 2017, Nature Ecology & Evolution.

[16]  F. Gage,et al.  Conserved expression of transposon-derived non-coding transcripts in primate stem cells , 2017, BMC Genomics.

[17]  E. Barillot,et al.  Comparative analyses of super-enhancers reveal conserved elements in vertebrate genomes , 2017, Genome research.

[18]  S. Oliver,et al.  Estimating the total number of phosphoproteins and phosphorylation sites in eukaryotic proteomes , 2017, GigaScience.

[19]  C. Baker,et al.  The evolution and population diversity of human-specific segmental duplications. , 2017, Nature ecology & evolution.

[20]  Hugh E. Olsen,et al.  The Oxford Nanopore MinION: delivery of nanopore sequencing to the genomics community , 2016, Genome Biology.

[21]  William Stafford Noble,et al.  Mapping 3D genome architecture through in situ DNase Hi-C , 2016, Nature Protocols.

[22]  Zev N. Kronenberg,et al.  Interchromosomal core duplicons drive both evolutionary instability and disease susceptibility of the Chromosome 8p23.1 region , 2016, Genome research.

[23]  Wieland B Huttner,et al.  Differences and similarities between human and chimpanzee neural progenitors during cerebral cortex development , 2016, eLife.

[24]  Zhengang Yang,et al.  The hominoid-specific gene TBC1D3 promotes generation of basal neural progenitors and induces cortical folding in mice , 2016, eLife.

[25]  Evan Z. Macosko,et al.  Comprehensive Classification of Retinal Bipolar Neurons by Single-Cell Transcriptomics , 2016, Cell.

[26]  Victor Guryev,et al.  Characterizing polymorphic inversions in human genomes by single-cell sequencing , 2016, Genome research.

[27]  Mario Stanke,et al.  Simultaneous gene finding in multiple genomes , 2016, Bioinform..

[28]  Fred H. Gage,et al.  Emergence of a Homo sapiens-specific gene family and chromosome 16p11.2 CNV susceptibility , 2016, Nature.

[29]  Yun S. Song,et al.  The Simons Genome Diversity Project: 300 genomes from 142 diverse populations , 2016, Nature.

[30]  James T. Robinson,et al.  Juicebox Provides a Visualization System for Hi-C Contact Maps with Unlimited Zoom. , 2016, Cell systems.

[31]  Lior Pachter,et al.  Near-optimal probabilistic RNA-seq quantification , 2016, Nature Biotechnology.

[32]  David Haussler,et al.  Long-read sequence assembly of the gorilla genome , 2016, Science.

[33]  P. Moorjani,et al.  Variation in the molecular clock of primates , 2016, Proceedings of the National Academy of Sciences.

[34]  James Y. Zou Analysis of protein-coding genetic variation in 60,706 humans , 2015, Nature.

[35]  Madeline A. Lancaster,et al.  Human cerebral organoids recapitulate gene expression programs of fetal neocortex development , 2015, Proceedings of the National Academy of Sciences.

[36]  Evan E. Eichler,et al.  Genetic variation and the de novo assembly of human genomes , 2015, Nature Reviews Genetics.

[37]  Kevin Y. Yip,et al.  Genome-Wide Structural Variation Detection by Genome Mapping on Nanochannel Arrays , 2015, Genetics.

[38]  Gabor T. Marth,et al.  An integrated map of structural variation in 2,504 human genomes , 2015, Nature.

[39]  Alex A. Pollen,et al.  Molecular Identity of Human Outer Radial Glia during Cortical Development , 2015, Cell.

[40]  Bradley P. Coe,et al.  Global diversity, population stratification, and selection of human copy-number variation , 2015, Science.

[41]  Russell E. Durrett,et al.  Assembly and diploid architecture of an individual human genome via single-molecule technologies , 2015, Nature Methods.

[42]  Stephen Hartley,et al.  QoRTs: a comprehensive toolset for quality control and data processing of RNA-Seq experiments , 2015, BMC Bioinformatics.

[43]  Xiandong Meng,et al.  Widespread Polycistronic Transcripts in Fungi Revealed by Single-Molecule mRNA Sequencing , 2015, PloS one.

[44]  Janet Kelso,et al.  Human-specific gene ARHGAP11B promotes basal progenitor amplification and neocortex expansion , 2015, Science.

[45]  R. Gordân,et al.  Human-Chimpanzee Differences in a FZD8 Enhancer Alter Cell-Cycle Dynamics in the Developing Neocortex , 2015, Current Biology.

[46]  Janet Kelso,et al.  deML: robust demultiplexing of Illumina sequences using a likelihood-based approach , 2014, Bioinform..

[47]  Christina A. Cuomo,et al.  Pilon: An Integrated Tool for Comprehensive Microbial Variant Detection and Genome Assembly Improvement , 2014, PloS one.

[48]  Kali T. Witherspoon,et al.  Refining analyses of copy number variation identifies specific genes associated with developmental delay , 2014, Nature Genetics.

[49]  Björn Usadel,et al.  Trimmomatic: a flexible trimmer for Illumina sequence data , 2014, Bioinform..

[50]  G. Bejerano,et al.  A Penile Spine/Vibrissa Enhancer Sequence Is Missing in Modern and Extinct Humans but Is Retained in Multiple Primates with Penile Spines and Sensory Vibrissae , 2013, PLoS ONE.

[51]  H. Ng,et al.  Induction of a human pluripotent state with distinct regulatory circuitry that resembles preimplantation epiblast. , 2013, Cell stem cell.

[52]  Andrew C. Adey,et al.  Chromosome-scale scaffolding of de novo genome assemblies based on chromatin interactions , 2013, Nature Biotechnology.

[53]  Gene W. Yeo,et al.  Differential LINE-1 regulation in pluripotent stem cells of humans and other great apes , 2013, Nature.

[54]  Miriam K. Konkel,et al.  Rates and patterns of great ape retrotransposition , 2013, Proceedings of the National Academy of Sciences.

[55]  Aaron A. Klammer,et al.  Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data , 2013, Nature Methods.

[56]  Arcadi Navarro,et al.  Great ape genetic diversity and population history , 2013, Nature.

[57]  Ellen T. Gelfand,et al.  The Genotype-Tissue Expression (GTEx) project , 2013, Nature Genetics.

[58]  Thomas R. Gingeras,et al.  STAR: ultrafast universal RNA-seq aligner , 2013, Bioinform..

[59]  J. Marth,et al.  Biosynthesis of the major brain gangliosides GD1a and GT1b. , 2012, Glycobiology.

[60]  Glenn Tesler,et al.  Mapping single molecule sequencing reads using basic local alignment with successive refinement (BLASR): application and theory , 2012, BMC Bioinformatics.

[61]  Bronwen L. Aken,et al.  GENCODE: The reference human genome annotation for The ENCODE Project , 2012, Genome research.

[62]  Gabor T. Marth,et al.  Haplotype-based variant detection from short-read sequencing , 2012, 1207.3907.

[63]  P. Kwok,et al.  Genome mapping on nanochannel arrays for structural variation analysis and sequence assembly , 2012, Nature Biotechnology.

[64]  Anirvan Ghosh,et al.  Inhibition of SRGAP2 Function by Its Human-Specific Paralogs Induces Neoteny during Spine Maturation , 2012, Cell.

[65]  Peter H. Sudmant,et al.  Evolution of Human-Specific Neural SRGAP2 Genes by Incomplete Segmental Duplication , 2012, Cell.

[66]  Thomas Meitinger,et al.  Genetic adaptation of fatty-acid metabolism: a human-specific haplotype increasing the biosynthesis of long-chain omega-3 and omega-6 fatty acids. , 2012, American journal of human genetics.

[67]  Albert J. Vilella,et al.  Insights into hominid evolution from the gorilla genome sequence , 2012, Nature.

[68]  Jay Shendure,et al.  Trans genomic capture and sequencing of primate exomes reveals new targets of positive selection. , 2011, Genome research.

[69]  S. Brunak,et al.  SignalP 4.0: discriminating signal peptides from transmembrane regions , 2011, Nature Methods.

[70]  Arcadi Navarro,et al.  Gorilla genome structural variation reveals evolutionary parallelisms with chimpanzee. , 2011, Genome research.

[71]  David Haussler,et al.  Cactus: Algorithms for genome multiple sequence alignment. , 2011, Genome research.

[72]  Gonçalo R. Abecasis,et al.  The variant call format and VCFtools , 2011, Bioinform..

[73]  J. Ferrell,et al.  Ultrasensitivity in the Regulation of Cdc25C by Cdk1. , 2011, Molecular cell.

[74]  Cory Y. McLean,et al.  Human-specific loss of regulatory DNA and the evolution of human-specific traits , 2011, Nature.

[75]  Albert J. Vilella,et al.  Comparative and demographic analysis of orang-utan genomes , 2011, Nature.

[76]  S. Herculano‐Houzel The Human Brain in Numbers: A Linearly Scaled-up Primate Brain , 2009, Front. Hum. Neurosci..

[77]  C. Baker,et al.  A burst of segmental duplications in the genome of the African great ape ancestor , 2009, Nature.

[78]  S. Turner,et al.  Real-Time DNA Sequencing from Single Polymerase Molecules , 2009, Science.

[79]  Ajit Varki,et al.  Human uniqueness: genome interactions with environment, behaviour and culture , 2008, Nature Reviews Genetics.

[80]  Zhaoshi Jiang,et al.  Evolutionary toggling of the MAPT 17q21.31 inversion region , 2008, Nature Genetics.

[81]  David Haussler,et al.  Using native and syntenically mapped cDNA alignments to improve de novo gene finding , 2008, Bioinform..

[82]  N. Archidiacono,et al.  Primate chromosome evolution: Ancestral karyotypes, marker order and neocentromeres , 2008, Chromosome Research.

[83]  E. Eichler,et al.  Hominoid chromosomal rearrangements on 17q map to complex regions of segmental duplication , 2008, Genome Biology.

[84]  E. Eichler,et al.  Ancestral reconstruction of segmental duplications reveals punctuated cores of human genome evolution , 2007, Nature Genetics.

[85]  M. Emerman,et al.  Restriction of an Extinct Retrovirus by the Human TRIM5α Antiviral Protein , 2007, Science.

[86]  Francesca Antonacci,et al.  Evolutionary Formation of New Centromeres in Macaque , 2007, Science.

[87]  M. Emerman,et al.  Restriction of an extinct retrovirus by the human TRIM5alpha antiviral protein. , 2007, Science.

[88]  Wen-Hsiung Li,et al.  An evaluation of the molecular clock hypothesis using mammalian DNA sequences , 2007, Journal of Molecular Evolution.

[89]  Burkhard Morgenstern,et al.  AUGUSTUS: ab initio prediction of alternative transcripts , 2006, Nucleic Acids Res..

[90]  N. Bowen,et al.  Identification, characterization and comparative genomics of chimpanzee endogenous retroviruses , 2006, Genome Biology.

[91]  Navin Elango,et al.  Variable molecular clocks in hominoids , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[92]  D. Cooper,et al.  Polymorphic micro-inversions contribute to the genomic variability of humans and chimpanzees , 2006, Human Genetics.

[93]  Burkhard Morgenstern,et al.  Gene prediction in eukaryotes with a generalized hidden Markov model that uses hints from external sources , 2006, BMC Bioinformatics.

[94]  E. Eichler,et al.  A genome-wide survey of structural variation between human and chimpanzee. , 2005, Genome research.

[95]  L. Feuk,et al.  Discovery of Human Inversion Polymorphisms by Comparative Analysis of Human and Chimpanzee DNA Sequence Assemblies , 2005, PLoS genetics.

[96]  E. Eichler,et al.  Lineage-Specific Expansions of Retroviral Insertions within the Genomes of African Great Apes but Not Humans and Orangutans , 2005, PLoS biology.

[97]  Gustavo Glusman,et al.  Initial sequence of the chimpanzee genome and comparison with the human genome , 2005 .

[98]  T. Gojobori,et al.  Molecular evolution and phylogeny of the human AIDS viruses LAV, HTLV-III, and ARV , 2005, Journal of Molecular Evolution.

[99]  G. Stein,et al.  Functional characterization of a human histone gene cluster duplication. , 2004, Gene.

[100]  J. Sikela,et al.  Lineage-Specific Gene Duplication and Loss in Human and Great Ape Evolution , 2004, PLoS biology.

[101]  H. Ellegren,et al.  Microsatellite evolution inferred from human– chimpanzee genomic sequence alignments , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[102]  M V Olson,et al.  When less is more: gene loss as an engine of evolutionary change. , 1999, American journal of human genetics.

[103]  P. Rakic A small step for the cell, a giant leap for mankind: a hypothesis of neocortical expansion during evolution , 1995, Trends in Neurosciences.

[104]  C. Ross,et al.  Microsatellite evolution — evidence for directionality and variation in rate between species , 1995, Nature Genetics.

[105]  G Hermanson,et al.  High-resolution mapping of human chromosome 11 by in situ hybridization with cosmid clones. , 1990, Science.

[106]  J. Yunis,et al.  The origin of man: a chromosomal pictorial legacy. , 1982, Science.

[107]  M. King,et al.  Evolution at two levels in humans and chimpanzees. , 1975, Science.