Population-wide sampling of retrotransposon insertion polymorphisms using deep sequencing and efficient detection

Abstract Active retrotransposons play important roles during evolution and continue to shape our genomes today, especially in genetic polymorphisms underlying a diverse set of diseases. However, studies of human retrotransposon insertion polymorphisms (RIPs) based on whole-genome deep sequencing at the population level have not been sufficiently undertaken, despite the obvious need for a thorough characterization of RIPs in the general population. Herein, we present a novel and efficient computational tool called Specific Insertions Detector (SID) for the detection of non-reference RIPs. We demonstrate that SID is suitable for high-depth whole-genome sequencing data using paired-end reads obtained from simulated and real datasets. We construct a comprehensive RIP database using a large population of 90 Han Chinese individuals with a mean ×68 depth per individual. In total, we identify 9342 recent RIPs, and 8433 of these RIPs are novel compared with dbRIP, including 5826 Alu, 2169 long interspersed nuclear element 1 (L1), 383 SVA, and 55 long terminal repeats. Among the 9342 RIPs, 4828 were located in gene regions and 5 were located in protein-coding regions. We demonstrate that RIPs can, in principle, be an informative resource to perform population evolution and phylogenetic analyses. Taking the demographic effects into account, we identify a weak negative selection on SVA and L1 but an approximately neutral selection for Alu elements based on the frequency spectrum of RIPs. SID is a powerful open-source program for the detection of non-reference RIPs. We built a non-reference RIP dataset that greatly enhanced the diversity of RIPs detected in the general population, and it should be invaluable to researchers interested in many aspects of human evolution, genetics, and disease. As a proof of concept, we demonstrate that the RIPs can be used as biomarkers in a similar way as single nucleotide polymorphisms.

[1]  Richard Cordaux,et al.  Estimating the retrotransposition rate of human Alu elements. , 2006, Gene.

[2]  H. Harn,et al.  Angiotensin I converting enzyme gene polymorphism in Chinese patients with hypertension. , 1997, American journal of hypertension.

[3]  F. Tajima Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. , 1989, Genetics.

[4]  T. Kurtz,et al.  Frequency of a deletion polymorphism in the gene for angiotensin converting enzyme is increased in African-Americans with hypertension. , 1994, American journal of hypertension.

[5]  Nansheng Chen,et al.  Using RepeatMasker to Identify Repetitive Elements in Genomic Sequences , 2009, Current protocols in bioinformatics.

[6]  Lovelace J. Luquette,et al.  Landscape of Somatic Retrotransposition in Human Cancers , 2012, Science.

[7]  S. Boissinot,et al.  L1 (LINE-1) retrotransposon evolution and amplification in recent human history. , 2000, Molecular biology and evolution.

[8]  X. Xie,et al.  Genome-Wide Detection of Single-Nucleotide and Copy-Number Variations of a Single Human Cell , 2012, Science.

[9]  Sébastien Tempel Using and understanding RepeatMasker. , 2012, Methods in molecular biology.

[10]  Dawei Li,et al.  The diploid genome sequence of an Asian individual , 2008, Nature.

[11]  E. Kirkness,et al.  Mobile elements create structural variation: analysis of a complete human genome. , 2009, Genome research.

[12]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[13]  B. Morris,et al.  Association of a polymorphism of the angiotensin I-converting enzyme gene with essential hypertension. , 1992, Biochemical and biophysical research communications.

[14]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[15]  Faraz Hach,et al.  Next-generation VariationHunter: combinatorial algorithms for transposon insertion discovery , 2010, Bioinform..

[16]  R. J. Herrera,et al.  African origin of human-specific polymorphic Alu insertions. , 1994, Proceedings of the National Academy of Sciences of the United States of America.

[17]  M. Batzer,et al.  Alu repeats and human genomic diversity , 2002, Nature Reviews Genetics.

[18]  D. Largaespada,et al.  Extensive somatic L1 retrotransposition in colorectal tumors , 2012, Genome research.

[19]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[20]  A. Troxel,et al.  Genomic characterization of recent human LINE-1 insertions: evidence supporting random insertion. , 2001, Genome research.

[21]  J. Jurka,et al.  Repbase Update, a database of eukaryotic repetitive elements , 2005, Cytogenetic and Genome Research.

[22]  E. Eichler,et al.  A Human Genome Structural Variation Sequencing Resource Reveals Insights into Mutational Mechanisms , 2010, Cell.

[23]  Ryan D. Hernandez,et al.  Assessing the Evolutionary Impact of Amino Acid Mutations in the Human Genome , 2008, PLoS genetics.

[24]  Piero Carninci,et al.  Edinburgh Research Explorer Endogenous Retrotransposition Activates Oncogenic Pathways in Hepatocellular Carcinoma Endogenous Retrotransposition Activates Oncogenic Pathways in Hepatocellular Carcinoma , 2022 .

[25]  M. Batzer,et al.  The impact of retrotransposons on human genome evolution , 2009, Nature Reviews Genetics.

[26]  P Corvol,et al.  An insertion/deletion polymorphism in the angiotensin I-converting enzyme gene accounting for half the variance of serum enzyme levels. , 1990, The Journal of clinical investigation.

[27]  Thomas M. Keane,et al.  Enhanced structural variant and breakpoint detection using SVMerge by integration of multiple detection methods and local assembly , 2010, Genome Biology.

[28]  J. V. Moran,et al.  Hot L1s account for the bulk of retrotransposition in the human population , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[29]  J. Mattick,et al.  Somatic retrotransposition alters the genetic landscape of the human brain , 2011, Nature.

[30]  G. Swergold,et al.  Tracing the LINEs of human evolution , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[31]  D. C. Hancks,et al.  Active human retrotransposons: variation and disease. , 2012, Current opinion in genetics & development.

[32]  A. F. Scott,et al.  Isolation of an active human transposable element. , 1991, Science.

[33]  Maite G. Barrón,et al.  T-lex2: genotyping, frequency estimation and re-annotation of transposable elements using single or pooled next-generation sequencing data , 2014, bioRxiv.

[34]  L. Jorde,et al.  Mobile element biology: new possibilities with high-throughput sequencing. , 2013, Trends in genetics : TIG.

[35]  Deepak Grover,et al.  dbRIP: A highly integrated database of retrotransposon insertion polymorphisms in humans , 2006, Human mutation.

[36]  Deniz Yorukoglu,et al.  Alu repeat discovery and characterization within human genomes. , 2011, Genome research.

[37]  International Human Genome Sequencing Consortium Initial sequencing and analysis of the human genome , 2001, Nature.

[38]  S. Yusuf,et al.  Differences in risk factors, atherosclerosis, and cardiovascular disease between ethnic groups in Canada: the Study of Health Assessment and Risk in Ethnic groups (SHARE) , 2000, The Lancet.

[39]  A. Asamoah,et al.  A deletion in the angiotensin converting enzyme (ACE) gene is common among African Americans with essential hypertension , 1996 .

[40]  M. DePristo,et al.  The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. , 2010, Genome research.

[41]  Gabor T. Marth,et al.  An integrated map of structural variation in 2,504 human genomes , 2015, Nature.

[42]  X. Huang,et al.  CAP3: A DNA sequence assembly program. , 1999, Genome research.

[43]  Thomas M. Keane,et al.  RetroSeq: transposable element discovery from next-generation sequencing data , 2013, Bioinform..

[44]  Adrian M. Stütz,et al.  A Comprehensive Map of Mobile Element Insertion Polymorphisms in Humans , 2011, PLoS genetics.

[45]  H. Kazazian,et al.  Whole-genome resequencing allows detection of many rare LINE-1 insertion alleles in humans. , 2011, Genome research.

[46]  Kai Ye,et al.  Mobster: accurate detection of mobile element insertions in next generation sequencing data , 2014, Genome Biology.

[47]  Huanming Yang,et al.  Deep whole-genome sequencing of 90 Han Chinese genomes , 2017, GigaScience.

[48]  J. Boeke,et al.  Human Transposon Tectonics , 2012, Cell.

[49]  Qichao Yu Simulating reads for detection of transportable element insertions , 2018 .

[50]  Zhen Yue,et al.  pIRS: Profile-based Illumina pair-end reads simulator , 2012, Bioinform..