Emergence of a Homo sapiens-specific gene family and chromosome 16p11.2 CNV susceptibility

Genetic differences that specify unique aspects of human evolution have typically been identified by comparative analyses between the genomes of humans and closely related primates, including more recently the genomes of archaic hominins. Not all regions of the genome, however, are equally amenable to such study. Recurrent copy number variation (CNV) at chromosome 16p11.2 accounts for approximately 1% of cases of autism and is mediated by a complex set of segmental duplications, many of which arose recently during human evolution. Here we reconstruct the evolutionary history of the locus and identify bolA family member 2 (BOLA2) as a gene duplicated exclusively in Homo sapiens. We estimate that a 95-kilobase-pair segment containing BOLA2 duplicated across the critical region approximately 282 thousand years ago (ka), one of the latest among a series of genomic changes that dramatically restructured the locus during hominid evolution. All humans examined carried one or more copies of the duplication, which nearly fixed early in the human lineage—a pattern unlikely to have arisen so rapidly in the absence of selection (P < 0.0097). We show that the duplication of BOLA2 led to a novel, human-specific in-frame fusion transcript and that BOLA2 copy number correlates with both RNA expression (r = 0.36) and protein level (r = 0.65), with the greatest expression difference between human and chimpanzee in experimentally derived stem cells. Analyses of 152 patients carrying a chromosome 16p11.2 rearrangement show that more than 96% of breakpoints occur within the H. sapiens-specific duplication. In summary, the duplicative transposition of BOLA2 at the root of the H. sapiens lineage about 282 ka simultaneously increased copy number of a gene associated with iron homeostasis and predisposed our species to recurrent rearrangements associated with disease.

[1]  Lior Pachter,et al.  Near-optimal RNA-Seq quantification , 2015, ArXiv.

[2]  D. Conrad,et al.  Recurrent 16p11.2 microdeletions in autism. , 2007, Human molecular genetics.

[3]  Tetsuo Nishikawa,et al.  Assessing protein coding region integrity in cDNA sequencing projects , 1998, Bioinform..

[4]  Nicholas T. Ingolia,et al.  Ribosome Profiling Provides Evidence that Large Noncoding RNAs Do Not Encode Proteins , 2013, Cell.

[5]  Nicholas T. Ingolia,et al.  Genome-Wide Analysis in Vivo of Translation with Nucleotide Resolution Using Ribosome Profiling , 2009, Science.

[6]  Philip L. F. Johnson,et al.  The complete genome sequence of a Neandertal from the Altai Mountains , 2013, Nature.

[7]  Uwe Ohler,et al.  Detecting actively translated open reading frames in ribosome profiling data , 2015, Nature Methods.

[8]  Michael P Snyder,et al.  Integrative analysis of RNA, translation, and protein levels reveals distinct regulatory variation across humans , 2015, Genome research.

[9]  Peter H. Sudmant,et al.  Palindromic GOLGA8 core duplicons promote chromosome 15q13.3 microdeletion and evolutionary instability , 2014, Nature Genetics.

[10]  M. Daly,et al.  A Potential Contributory Role for Ciliary Dysfunction in the 16p11.2 600 kb BP4-BP5 Pathology. , 2015, American journal of human genetics.

[11]  P. Elliott,et al.  Mirror extreme BMI phenotypes associated with gene dosage at the chromosome 16p11.2 locus , 2011, Nature.

[12]  Jay Shendure,et al.  Rapid and accurate large-scale genotyping of duplicated genes and discovery of novel sites of interlocus gene conversion , 2013, Nature Methods.

[13]  Michael K. Johnson,et al.  Human glutaredoxin 3 forms [2Fe-2S]-bridged complexes with human BolA2. , 2012, Biochemistry.

[14]  Geoffrey J. Barton,et al.  Jalview Version 2—a multiple sequence alignment editor and analysis workbench , 2009, Bioinform..

[15]  Julie D Thompson,et al.  Multiple Sequence Alignment Using ClustalW and ClustalX , 2003, Current protocols in bioinformatics.

[16]  E. Eichler,et al.  Ancestral reconstruction of segmental duplications reveals punctuated cores of human genome evolution , 2007, Nature Genetics.

[17]  Thomas R. Gingeras,et al.  STAR: ultrafast universal RNA-seq aligner , 2013, Bioinform..

[18]  Peter H. Sudmant,et al.  Diversity of Human Copy Number Variation and Multicopy Genes , 2010, Science.

[19]  Gene W. Yeo,et al.  Differential LINE-1 regulation in pluripotent stem cells of humans and other great apes , 2013, Nature.

[20]  Andrew C. Adey,et al.  Rapid, low-input, low-bias construction of shotgun fragment libraries by high-density in vitro transposition , 2010, Genome Biology.

[21]  Scott M. Williams,et al.  The Genetic Structure and History of Africans and African Americans , 2009, Science.

[22]  Gonçalo R. Abecasis,et al.  The variant call format and VCFtools , 2011, Bioinform..

[23]  Gregory Ewing,et al.  MSMS: a coalescent simulation program including recombination, demographic structure and selection at a single locus , 2010, Bioinform..

[24]  Eric W. Deutsch,et al.  The PeptideAtlas project , 2005, Nucleic Acids Res..

[25]  D. Winge,et al.  Identification of FRA1 and FRA2 as Genes Involved in Regulating the Yeast Iron Regulon in Response to Decreased Mitochondrial Iron-Sulfur Cluster Synthesis* , 2008, Journal of Biological Chemistry.

[26]  Evan E. Eichler,et al.  Positive selection of a gene family during the emergence of humans and African apes , 2001, Nature.

[27]  Peter H. Sudmant,et al.  Evolution of Human-Specific Neural SRGAP2 Genes by Incomplete Segmental Duplication , 2012, Cell.

[28]  C. Lillig,et al.  Crucial function of vertebrate glutaredoxin 3 (PICOT) in iron homeostasis and hemoglobin maturation , 2013, Molecular biology of the cell.

[29]  Jay Shendure,et al.  Single molecule molecular inversion probes for targeted, high-accuracy detection of low-frequency variation , 2013, Genome research.

[30]  Ziheng Yang PAML 4: phylogenetic analysis by maximum likelihood. , 2007, Molecular biology and evolution.

[31]  C. Lord,et al.  The Simons Simplex Collection: A Resource for Identification of Autism Genetic Risk Factors , 2010, Neuron.

[32]  Mark J. P. Chaisson,et al.  Resolving the complexity of the human genome using single-molecule sequencing , 2014, Nature.

[33]  Bradley P. Coe,et al.  Global diversity, population stratification, and selection of human copy-number variation , 2015, Science.

[34]  B. Trask,et al.  Segmental duplications: organization and impact within the current human genome project assembly. , 2001, Genome research.

[35]  Jonathan Scott Friedlaender,et al.  Excavating Neandertal and Denisovan DNA from the genomes of Melanesian individuals , 2016, Science.

[36]  S. Bergmann,et al.  The evolution of gene expression levels in mammalian organs , 2011, Nature.

[37]  Rob Patro,et al.  Sailfish enables alignment-free isoform quantification from RNA-seq reads using lightweight algorithms , 2013, Nature Biotechnology.

[38]  J. Shendure,et al.  Resolving genomic disorder–associated breakpoints within segmental DNA duplications using massively parallel sequencing , 2014, Nature Protocols.

[39]  J. Thomson,et al.  Embryonic stem cell lines derived from human blastocysts. , 1998, Science.

[40]  Desmond G. Higgins,et al.  GWIPS-viz: development of a ribo-seq genome browser , 2013, Nucleic Acids Res..

[41]  C. Kai,et al.  CAGE: cap analysis of gene expression , 2006, Nature Methods.

[42]  Anders Gorm Pedersen,et al.  Neural Network Prediction of Translation Initiation Sites in Eukaryotes: Perspectives for EST and Genome Analysis , 1997, ISMB.

[43]  Steven Scherer,et al.  Recurrent duplication-driven transposition of DNA during hominoid evolution , 2006, Proceedings of the National Academy of Sciences.

[44]  Fabian Sievers,et al.  Clustal Omega , 2014, Current protocols in bioinformatics.

[45]  Arcadi Navarro,et al.  Great ape genetic diversity and population history , 2013, Nature.

[46]  W. Pearson,et al.  Current Protocols in Bioinformatics , 2002 .

[47]  Heng Li,et al.  Genome sequence of a 45,000-year-old modern human from western Siberia , 2014, Nature.

[48]  J. D. Parsons,et al.  Miropeats: graphical DNA sequence comparisons , 1995, Comput. Appl. Biosci..

[49]  Chad A. Cowan,et al.  Derivation of embryonic stem-cell lines from human blastocysts. , 2004, The New England journal of medicine.

[50]  M. Nei,et al.  MEGA5: molecular evolutionary genetics analysis using maximum likelihood, evolutionary distance, and maximum parsimony methods. , 2011, Molecular biology and evolution.

[51]  L. Banci,et al.  Elucidating the Molecular Function of Human BOLA2 in GRX3-Dependent Anamorsin Maturation Pathway. , 2015, Journal of the American Chemical Society.

[52]  Judith D. Cohn,et al.  The sequence and analysis of duplication-rich human chromosome 16 , 2004, Nature.

[53]  B. Berger,et al.  Ancient human genomes suggest three ancestral populations for present-day Europeans , 2013, Nature.

[54]  Pedro G. Ferreira,et al.  Transcriptome and genome sequencing uncovers functional variation in humans , 2013, Nature.

[55]  Joshua M. Korn,et al.  Mapping and sequencing of structural variation from eight human genomes , 2008, Nature.

[56]  E. Eichler,et al.  DupMasker: a tool for annotating primate segmental duplications. , 2008, Genome research.

[57]  Gabor T. Marth,et al.  A global reference for human genetic variation , 2015, Nature.

[58]  M. King,et al.  Evolution at two levels in humans and chimpanzees. , 1975, Science.

[59]  Andres Metspalu,et al.  A common 16p11.2 inversion underlies the joint susceptibility to asthma and obesity. , 2014, American journal of human genetics.

[60]  Pardis C Sabeti,et al.  Genetic signatures of strong recent positive selection at the lactase gene. , 2004, American journal of human genetics.

[61]  B. Blencowe,et al.  Smg1 is required for embryogenesis and regulates diverse genes via alternative splicing coupled to nonsense-mediated mRNA decay , 2010, Proceedings of the National Academy of Sciences.

[62]  Adrian W. Briggs,et al.  A High-Coverage Genome Sequence from an Archaic Denisovan Individual , 2012, Science.

[63]  Joshua M. Korn,et al.  Association between microdeletion and microduplication at 16p11.2 and autism , 2008 .

[64]  M. Slatkin,et al.  The Projection of a Test Genome onto a Reference Population and Applications to Humans and Archaic Hominins , 2014, Genetics.

[65]  Koichiro Tamura,et al.  MEGA6: Molecular Evolutionary Genetics Analysis version 6.0. , 2013, Molecular biology and evolution.

[66]  Mark J. P. Chaisson,et al.  Reconstructing complex regions of genomes using long-read sequencing technology , 2014, Genome research.

[67]  Joshua M. Korn,et al.  Association between microdeletion and microduplication at 16p11.2 and autism. , 2008, The New England journal of medicine.

[68]  C. Outten,et al.  Monothiol CGFS glutaredoxins and BolA-like proteins: [2Fe-2S] binding partners in iron homeostasis. , 2012, Biochemistry.

[69]  Richard R. Hudson,et al.  Generating samples under a Wright-Fisher neutral model of genetic variation , 2002, Bioinform..

[70]  Evan E. Eichler,et al.  A hot spot of genetic instability in autism. , 2008, The New England journal of medicine.

[71]  Allison G. Dempsey,et al.  A 600 kb deletion syndrome at 16p11.2 leads to energy imbalance and neuropsychiatric disorders , 2012, Journal of Medical Genetics.

[72]  Fred H. Gage,et al.  A Model for Neural Development and Treatment of Rett Syndrome Using Human Induced Pluripotent Stem Cells , 2010, Cell.

[73]  Kenny Q. Ye,et al.  An integrated map of genetic variation from 1,092 human genomes , 2012, Nature.

[74]  The Simons,et al.  Simons Variation in Individuals Project (Simons VIP): A Genetics-First Approach to Studying Autism Spectrum and Related Neurodevelopmental Disorders , 2012, Neuron.

[75]  Jun S. Liu,et al.  The Genotype-Tissue Expression (GTEx) pilot analysis: Multitissue gene regulation in humans , 2015, Science.

[76]  Eric S. Lander,et al.  Genetic evidence for complex speciation of humans and chimpanzees , 2006, Nature.

[77]  Aaron A. Klammer,et al.  Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data , 2013, Nature Methods.