A computational framework to assess genome-wide distribution of polymorphic human endogenous retrovirus-K In human populations

Human Endogenous Retrovirus type K (HERV-K) is the only HERV known to be insertionally polymorphic; not all individuals have a retrovirus at a specific genomic location. It is possible that HERV-Ks contribute to human disease because people differ in both number and genomic location of these retroviruses. Indeed viral transcripts, proteins, and antibody against HERV-K are detected in cancers, auto-immune, and neurodegenerative diseases. However, attempts to link a polymorphic HERV-K with any disease have been frustrated in part because population prevalence of HERV-K provirus at each polymorphic site is lacking and it is challenging to identify closely related elements such as HERV-K from short read sequence data. We present an integrated and computationally robust approach that uses whole genome short read data to determine the occupation status at all sites reported to contain a HERV-K provirus. Our method estimates the proportion of fixed length genomic sequence (k-mers) from whole genome sequence data matching a reference set of k-mers unique to each HERV-K locus and applies mixture model-based clustering of these values to account for low depth sequence data. Our analysis of 1000 Genomes Project Data (KGP) reveals numerous differences among the five KGP super-populations in the prevalence of individual and co-occurring HERV-K proviruses; we provide a visualization tool to easily depict the proportion of the KGP populations with any combination of polymorphic HERV-K provirus. Further, because HERV-K is insertionally polymorphic, the genome burden of known polymorphic HERV-K is variable in humans; this burden is lowest in East Asian (EAS) individuals. Our study identifies population-specific sequence variation for HERV-K proviruses at several loci. We expect these resources will advance research on HERV-K contributions to human diseases.

[1]  S. Salzberg,et al.  Repetitive DNA and next-generation sequencing: computational challenges and solutions , 2011, Nature Reviews Genetics.

[2]  Jeffrey Heer,et al.  SpanningAspectRatioBank Easing FunctionS ArrayIn ColorIn Date Interpolator MatrixInterpola NumObjecPointI Rectang ISchedu Parallel Pause Scheduler Sequen Transition Transitioner Transiti Tween Co DelimGraphMLCon IData JSONCon DataField DataSc Dat DataSource Data DataUtil DirtySprite LineS RectSprite , 2011 .

[3]  M. H. Margahny,et al.  FAST ALGORITHM FOR MINING ASSOCIATION RULES , 2014 .

[4]  Julia H Wildschutte,et al.  The distribution of insertionally polymorphic endogenous retroviruses in breast cancer patients and cancer-free controls , 2014, Retrovirology.

[5]  C. Feschotte,et al.  Endogenous viruses: insights into viral evolution and impact on host biology , 2012, Nature Reviews Genetics.

[6]  N. Bannert,et al.  HERV‐K(HML‐2), a seemingly silent subtenant – but still waters run deep , 2016, APMIS : acta pathologica, microbiologica, et immunologica Scandinavica.

[7]  M. Tristem,et al.  The Evolution, Distribution and Diversity of Endogenous Retroviruses , 2003, Virus Genes.

[8]  J. Kidd,et al.  Discovery of unfixed endogenous retrovirus insertions in diverse human populations , 2016, Proceedings of the National Academy of Sciences.

[9]  M. Grabherr,et al.  Broad-scale phylogenomics provides insights into retrovirus–host evolution , 2013, Proceedings of the National Academy of Sciences.

[10]  Peter Rossing,et al.  Are human endogenous retroviruses triggers of autoimmune diseases? Unveiling associations of three diseases and viral loci , 2016, Immunologic research.

[11]  Christopher Power,et al.  Human endogenous retroviruses and multiple sclerosis: Innocent bystanders or disease determinants? , 2010, Biochimica et Biophysica Acta (BBA) - Molecular Basis of Disease.

[12]  E. Eichler,et al.  A Human Genome Structural Variation Sequencing Resource Reveals Insights into Mutational Mechanisms , 2010, Cell.

[13]  W. Simmons,et al.  The Role of Human Endogenous Retroviruses (HERV-K) in the Pathogenesis of Human Cancers , 2016 .

[14]  P. Nelson,et al.  The role of molecular mimicry and other factors in the association of Human Endogenous Retroviruses and autoimmunity , 2016, APMIS : acta pathologica, microbiologica, et immunologica Scandinavica.

[15]  N. Bannert,et al.  HERV-K(HML-2), the Best Preserved Family of HERVs: Endogenization, Expression, and Implications in Health and Disease , 2013, Front. Oncol..

[16]  Jennifer F. Hughes,et al.  Evidence for genomic rearrangements mediated by human endogenous retroviruses during primate evolution , 2001, Nature Genetics.

[17]  R. Weiss The discovery of endogenous retroviruses , 2006, Retrovirology.

[18]  J. Coffin,et al.  Identification, characterization, and comparative genomic distribution of the HERV-K (HML-2) group of human endogenous retroviruses , 2011, Retrovirology.

[19]  Dixie L. Mager,et al.  Human-Specific Integrations of the HERV-K Endogenous Retrovirus Family , 1998, Journal of Virology.

[20]  Fulvio Cruciani,et al.  Evidence of extensive non-allelic gene conversion among LTR elements in the human genome , 2016, Scientific Reports.

[21]  David J Griffiths,et al.  Insertional polymorphisms: a new lease of life for endogenous retroviruses in human disease. , 2007, Trends in genetics : TIG.

[22]  M. Escobar,et al.  Bayesian Density Estimation and Inference Using Mixtures , 1995 .

[23]  Gabor T. Marth,et al.  An integrated map of structural variation in 2,504 human genomes , 2015, Nature.

[24]  P. Vogt,et al.  Two long homologous retroviral sequence blocks in proximal Yq11 cause AZFa microdeletions as a result of intrachromosomal recombination events. , 2000, Human molecular genetics.

[25]  Jennifer F Hughes,et al.  Human endogenous retrovirus K solo-LTR formation and insertional polymorphisms: implications for human and viral evolution. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[26]  H. Cynis,et al.  Human Endogenous Retroviruses and Their Putative Role in the Development of Autoimmune Disorders Such as Multiple Sclerosis , 2018, Front. Microbiol..

[27]  Pavel Pudil,et al.  Introduction to Statistical Pattern Recognition , 2006 .

[28]  Gabor T. Marth,et al.  A global reference for human genetic variation , 2015, Nature.

[29]  Lancelot F. James,et al.  Gibbs Sampling Methods for Stick-Breaking Priors , 2001 .

[30]  Frederic D Bushman,et al.  Methods for integration site distribution analyses in animal cell genomes. , 2009, Methods.

[31]  John M. Coffin,et al.  Differential Expression of HERV-K (HML-2) Proviruses in Cells and Virions of the Teratocarcinoma Cell Line Tera-1 , 2015, Viruses.

[32]  J. Coffin,et al.  Effects of retroviruses on host genome function. , 2008, Annual review of genetics.

[33]  F. Ryan,et al.  Human endogenous retroviruses in health and disease: a symbiotic perspective. , 2004, Journal of the Royal Society of Medicine.

[34]  D. Mager,et al.  Potential mechanisms of endogenous retroviral-mediated genomic instability in human cancer. , 2010, Seminars in cancer biology.

[35]  R. Belshaw,et al.  ‘There and back again’: revisiting the pathophysiological roles of human endogenous retroviruses in the post-genomic era , 2013, Philosophical Transactions of the Royal Society B: Biological Sciences.

[36]  D. Stetson,et al.  The enemy within: endogenous retroelements and autoimmune disease , 2014, Nature Immunology.

[37]  A. Burt,et al.  Rate of Recombinational Deletion among Human Endogenous Retroviruses , 2007, Journal of Virology.

[38]  John M. Coffin,et al.  Endogenous retroviruses and human cancer: is there anything to the rumors? , 2014, Cell host & microbe.

[39]  Alexander Kanapin,et al.  Unfixed Endogenous Retroviral Insertions in the Human Population , 2014, Journal of Virology.

[40]  Jennifer F. Hughes,et al.  Human Endogenous Retroviral Elements as Indicators of Ectopic Recombination Events in the Primate Genome , 2005, Genetics.

[41]  Jeffrey Heer,et al.  D³ Data-Driven Documents , 2011, IEEE Transactions on Visualization and Computer Graphics.

[42]  Jinchuan Xing,et al.  Mobile element scanning (ME-Scan) by targeted high-throughput sequencing , 2010, BMC Genomics.

[43]  N. Bannert,et al.  Retroelements and the human genome: New perspectives on an old relation , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[44]  Norbert Bannert,et al.  Beneficial and detrimental effects of human endogenous retroviruses , 2010, International journal of cancer.

[45]  J. Stoye,et al.  Are human endogenous retroviruses pathogenic? An approach to testing the hypothesis , 2013, BioEssays : news and reviews in molecular, cellular and developmental biology.

[46]  Dragan Maric,et al.  Human endogenous retrovirus-K contributes to motor neuron disease , 2015, Science Translational Medicine.

[47]  R. Löwer,et al.  The pathogenic potential of endogenous retroviruses: facts and fantasies. , 1999, Trends in microbiology.

[48]  Avindra Nath,et al.  Human Endogenous Retrovirus-K and TDP-43 Expression Bridges ALS and HIV Neuropathology , 2017, Front. Microbiol..

[49]  D. Mager,et al.  Endogenous retroviral LTRs as promoters for human genes: a critical assessment. , 2009, Gene.

[50]  D. Roden,et al.  Send Orders of Reprints at Reprints@benthamscience.net Human Endogenous Retroviruses (hervs) and Autoimmune Rheumatic Disease: Is There a Link? , 2022 .

[51]  J. Stoye Studies of endogenous retroviruses reveal a continuing evolutionary saga , 2012, Nature Reviews Microbiology.

[52]  Robert Belshaw,et al.  Genomewide Screening Reveals High Levels of Insertional Polymorphism in the Human Endogenous Retrovirus Family HERV-K(HML2): Implications for Present-Day Activity , 2005, Journal of Virology.

[53]  Guoliang Chen,et al.  A fast algorithm for mining association rules , 2008, Journal of Computer Science and Technology.

[54]  Xiaoping Su,et al.  Expression of human endogenous retrovirus-K is strongly associated with the basal-like breast cancer phenotype , 2017, Scientific Reports.

[55]  Kung Ahn,et al.  Human-Specific HERV-K Insertion Causes Genomic Variations in the Human Genome , 2013, PloS one.

[56]  Cliburn Chan,et al.  Discriminative variable subsets in Bayesian classification with mixture models, with application in flow cytometry studies , 2015, Biostatistics.

[57]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[58]  R. Kurth,et al.  The viruses in all of us: characteristics and biological significance of human endogenous retrovirus sequences. , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[59]  Jonathan P. Stoye,et al.  Making a virtue of necessity: the pleiotropic role of human endogenous retroviruses in cancer , 2017, Philosophical Transactions of the Royal Society B: Biological Sciences.