Transcriptional fates of human-specific segmental duplications in brain

Despite the importance of duplicate genes for evolutionary adaptation, accurate gene annotation is often incomplete, incorrect, or lacking in regions of segmental duplication. We developed an approach combining long-read sequencing and hybridization capture to yield full-length transcript information and confidently distinguish between nearly identical genes/paralogs. We used biotinylated probes to enrich for full-length cDNA from duplicated regions, which were then amplified, size-fractionated, and sequenced using single-molecule, long-read sequencing technology, permitting us to distinguish between highly identical genes by virtue of multiple paralogous sequence variants. We examined 19 gene families as expressed in developing and adult human brain, selected for their high sequence identity (average >99%) and overlap with human-specific segmental duplications (SDs). We characterized the transcriptional differences between related paralogs to better understand the birth-death process of duplicate genes and particularly how the process leads to gene innovation. In 48% of the cases, we find that the expressed duplicates have changed substantially from their ancestral models due to novel sites of transcription initiation, splicing, and polyadenylation, as well as fusion transcripts that connect duplication-derived exons with neighboring genes. We detect unannotated open reading frames in genes currently annotated as pseudogenes, while relegating other duplicates to nonfunctional status. Our method significantly improves gene annotation, specifically defining full-length transcripts, isoforms, and open reading frames for new genes in highly identical SDs. The approach will be more broadly applicable to genes in structurally complex regions of other genomes where the duplication process creates novel genes important for adaptive traits.

[1]  Clive D. L. Wynne,et al.  Structural variants in genes associated with human Williams-Beuren syndrome underlie stereotypical hypersociability in domestic dogs , 2017, Science Advances.

[2]  Carl Baker,et al.  The birth of a human-specific neural gene by incomplete duplication and gene fusion , 2017, Genome Biology.

[3]  M. Isupov,et al.  Structural History of Human SRGAP2 Proteins , 2017, Molecular biology and evolution.

[4]  Jennifer Harrow,et al.  High-throughput annotation of full-length long noncoding RNAs with Capture Long-Read Sequencing , 2017, Nature Genetics.

[5]  C. Baker,et al.  The evolution and population diversity of human-specific segmental duplications , 2017, Nature Ecology &Evolution.

[6]  S. Pääbo,et al.  A single splice site mutation in human-specific ARHGAP11B causes basal progenitor amplification , 2016, Science Advances.

[7]  Roland Eils,et al.  Complex heatmaps reveal patterns and correlations in multidimensional genomic data , 2016, Bioinform..

[8]  Zhengang Yang,et al.  The hominoid-specific gene TBC1D3 promotes generation of basal neural progenitors and induces cortical folding in mice , 2016, eLife.

[9]  Fred H. Gage,et al.  Emergence of a Homo sapiens-specific gene family and chromosome 16p11.2 CNV susceptibility , 2016, Nature.

[10]  Lior Pachter,et al.  Near-optimal probabilistic RNA-seq quantification , 2016, Nature Biotechnology.

[11]  Jonathan K. Pritchard,et al.  Coregulation of tandem duplicate genes slows evolution of subfunctionalization in mammals , 2015, Science.

[12]  Wen J. Li,et al.  Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation , 2015, Nucleic Acids Res..

[13]  Vincent J. Lynch,et al.  TP53 copy number expansion is associated with the evolution of increased body size and an enhanced DNA damage response in elephants , 2016, eLife.

[14]  Bradley P. Coe,et al.  Global diversity, population stratification, and selection of human copy-number variation , 2015, Science.

[15]  Stephen Hartley,et al.  QoRTs: a comprehensive toolset for quality control and data processing of RNA-Seq experiments , 2015, BMC Bioinformatics.

[16]  Xiandong Meng,et al.  Widespread Polycistronic Transcripts in Fungi Revealed by Single-Molecule mRNA Sequencing , 2015, PloS one.

[17]  Janet Kelso,et al.  Human-specific gene ARHGAP11B promotes basal progenitor amplification and neocortex expansion , 2015, Science.

[18]  Jiongtang Li,et al.  The fate of recent duplicated genes following a fourth-round whole genome duplication in a tetraploid fish, common carp (Cyprinus carpio) , 2015, Scientific Reports.

[19]  Janet Kelso,et al.  deML: robust demultiplexing of Illumina sequences using a likelihood-based approach , 2014, Bioinform..

[20]  Derek W Wright,et al.  Gateways to the FANTOM5 promoter level mammalian expression atlas , 2015, Genome Biology.

[21]  Boris Yamrom,et al.  The contribution of de novo coding mutations to autism spectrum disorder , 2014, Nature.

[22]  Björn Usadel,et al.  Trimmomatic: a flexible trimmer for Illumina sequence data , 2014, Bioinform..

[23]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[24]  Yun Sung Cho,et al.  Minke whale genome and aquatic adaptation in cetaceans , 2013, Nature Genetics.

[25]  J. Harrow,et al.  Assessment of transcript reconstruction methods for RNA-seq , 2013, Nature Methods.

[26]  Julie L. Yang,et al.  Ubiquitously transcribed genes use alternative polyadenylation to achieve tissue-specific expression , 2013, Genes & development.

[27]  P. Jagodziński,et al.  Association study of the 2-bp deletion polymorphism in exon 6 of the CHRFAM7A gene with idiopathic generalized epilepsy. , 2013, DNA and cell biology.

[28]  Ellen T. Gelfand,et al.  The Genotype-Tissue Expression (GTEx) project , 2013, Nature Genetics.

[29]  Thomas R. Gingeras,et al.  STAR: ultrafast universal RNA-seq aligner , 2013, Bioinform..

[30]  P. Khaitovich,et al.  Birth and expression evolution of mammalian microRNA genes , 2013, Genome research.

[31]  Michael D. Wilson,et al.  The Evolutionary Landscape of Alternative Splicing in Vertebrate Species , 2012, Science.

[32]  J. Lupski,et al.  DUF1220-domain copy number implicated in human brain-size pathology and evolution. , 2012, American journal of human genetics.

[33]  Bronwen L. Aken,et al.  GENCODE: The reference human genome annotation for The ENCODE Project , 2012, Genome research.

[34]  Anirvan Ghosh,et al.  Inhibition of SRGAP2 Function by Its Human-Specific Paralogs Induces Neoteny during Spine Maturation , 2012, Cell.

[35]  Peter H. Sudmant,et al.  Evolution of Human-Specific Neural SRGAP2 Genes by Incomplete Segmental Duplication , 2012, Cell.

[36]  D. Bertrand,et al.  The chimeric gene CHRFAM7A, a partial duplication of the CHRNA7 gene, is a dominant negative regulator of α7*nAChR function. , 2011, Biochemical pharmacology.

[37]  Joseph T. Glessner,et al.  A novel approach of homozygous haplotype sharing identifies candidate genes in autism spectrum disorder , 2011, Human Genetics.

[38]  Ana Kozomara,et al.  miRBase: integrating microRNA annotation and deep-sequencing data , 2010, Nucleic Acids Res..

[39]  E. Eichler,et al.  Limitations of next-generation genome sequence assembly , 2011, Nature Methods.

[40]  Peter H. Sudmant,et al.  Diversity of Human Copy Number Variation and Multicopy Genes , 2010, Science.

[41]  T. Nilsen,et al.  Expansion of the eukaryotic proteome by alternative splicing , 2010, Nature.

[42]  Matthew W. Hahn,et al.  Distinguishing among evolutionary models for the maintenance of gene duplicates. , 2009, The Journal of heredity.

[43]  Peter A. Meric,et al.  Lineage-Specific Biology Revealed by a Finished Genome Assembly of the Mouse , 2009, PLoS biology.

[44]  Liangbiao Chen,et al.  Transcriptomic and genomic evolution under constant cold in Antarctic notothenioid fish , 2008, Proceedings of the National Academy of Sciences.

[45]  William Ritchie,et al.  Entropy Measures Quantify Global Splicing Disorders in Cancer , 2008, PLoS Comput. Biol..

[46]  Ziheng Yang PAML 4: phylogenetic analysis by maximum likelihood. , 2007, Molecular biology and evolution.

[47]  D. Collier,et al.  Association study of CHRFAM7A copy number and 2bp deletion polymorphisms with schizophrenia and bipolar affective disorder , 2006, American journal of medical genetics. Part B, Neuropsychiatric genetics : the official publication of the International Society of Psychiatric Genetics.

[48]  Thomas D. Wu,et al.  GMAP: a genomic mapping and alignment program for mRNA and EST sequence , 2005, Bioinform..

[49]  J. Sikela,et al.  Lineage-Specific Gene Duplication and Loss in Human and Great Ape Evolution , 2004, PLoS biology.

[50]  M. Lynch,et al.  The Origins of Genome Complexity , 2003, Science.

[51]  Christopher B. Burge,et al.  Maximum entropy modeling of short sequence motifs with applications to RNA splicing signals , 2003, RECOMB '03.

[52]  D. Kioussis,et al.  Decision making in the immune system: Chromatin and CD4, CD8A and CD8B gene expression during thymic differentiation , 2002, Nature Reviews Immunology.

[53]  J. Greally,et al.  Identification of a Candidate Regulatory Region in the Human CD8 Gene Complex by Colocalization of DNase I Hypersensitive Sites and Matrix Attachment Regions Which Bind SATB1 and GATA-31 , 2002, The Journal of Immunology.

[54]  A. Chenchik,et al.  Reverse transcriptase template switching: a SMART approach for full-length cDNA library construction. , 2001, BioTechniques.

[55]  M. Lynch,et al.  The evolutionary fate and consequences of duplicate genes. , 2000, Science.

[56]  M. Raff,et al.  A role for Sonic hedgehog in axon-to-astrocyte signalling in the rodent optic nerve. , 1999, Development.

[57]  S. Palumbi,et al.  Molecular genetics of ecological diversification: duplication and rapid evolution of toxin genes of the venomous gastropod Conus. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[58]  J. Gault,et al.  Genomic organization and partial duplication of the human alpha7 neuronal nicotinic acetylcholine receptor gene (CHRNA7). , 1998, Genomics.

[59]  P. Kavathas,et al.  Appropriate developmental expression of human CD8 beta in transgenic mice. , 1997, Journal of immunology.

[60]  Dr. Susumu Ohno Evolution by Gene Duplication , 1970, Springer Berlin Heidelberg.