Centromeric Satellite DNAs: Hidden Sequence Variation in the Human Population

The central goal of medical genomics is to understand the inherited basis of sequence variation that underlies human physiology, evolution, and disease. Functional association studies currently ignore millions of bases that span each centromeric region and acrocentric short arm. These regions are enriched in long arrays of tandem repeats, or satellite DNAs, that are known to vary extensively in copy number and repeat structure in the human population. Satellite sequence variation in the human genome is often so large that it is detected cytogenetically, yet due to the lack of a reference assembly and informatics tools to measure this variability, contemporary high-resolution disease association studies are unable to detect causal variants in these regions. Nevertheless, recently uncovered associations between satellite DNA variation and human disease support that these regions present a substantial and biologically important fraction of human sequence variation. Therefore, there is a pressing and unmet need to detect and incorporate this uncharacterized sequence variation into broad studies of human evolution and medical genomics. Here I discuss the current knowledge of satellite DNA variation in the human genome, focusing on centromeric satellites and their potential implications for disease.

[1]  Pui-Yan Kwok,et al.  Genome maps across 26 human populations reveal population-specific patterns of structural variation , 2019, Nature Communications.

[2]  Karen H. Miga,et al.  Chromosomal rearrangements at hypomethylated Satellite 2 sequences are associated with impaired replication efficiency and increased fork stalling , 2019, bioRxiv.

[3]  Evan E. Eichler,et al.  Characterizing the Major Structural Variant Alleles of the Human Genome , 2019, Cell.

[4]  Simona Giunta,et al.  Repetitive Fragile Sites: Centromere Satellite DNA as a Source of Genome Instability in Human Diseases , 2018, Genes.

[5]  V. Barra,et al.  The dark side of centromeres: types, causes and consequences of structural abnormalities implicating centromeric DNA , 2018, Nature Communications.

[6]  Rachel M. Sherman,et al.  Assembly of a pan-genome from deep sequencing of 910 humans of African descent , 2018, Nature Genetics.

[7]  J. Lupski,et al.  The coexistence of copy number variations (CNVs) and single nucleotide polymorphisms (SNPs) at a locus can result in distorted calculations of the significance in associating SNPs to disease , 2018, Human Genetics.

[8]  S. Linnarsson,et al.  Meta-analysis of genome-wide association studies for neuroticism in 449,484 individuals identifies novel genetic loci and pathways , 2018, Nature Genetics.

[9]  Karen H. Miga,et al.  Haplotypes spanning centromeric regions reveal persistence of large blocks of archaic DNA , 2018, bioRxiv.

[10]  Karen H. Miga,et al.  Diverse haplotypes span human centromeres and include archaic lineages within and out of Africa , 2018 .

[11]  Sergey Koren,et al.  Variation in human chromosome 21 ribosomal RNA genes characterized by TAR cloning and long-read sequencing , 2018, Nucleic acids research.

[12]  Yakir A Reshef,et al.  Insights about clonal hematopoiesis from 8,342 mosaic chromosomal alterations , 2018, Nature.

[13]  A. Clark,et al.  Satellite DNA evolution: old ideas, new approaches. , 2018, Current opinion in genetics & development.

[14]  David Haussler,et al.  Linear assembly of a human centromere on the Y chromosome , 2018, Nature Biotechnology.

[15]  Pim van der Harst,et al.  Identification of 64 Novel Genetic Loci Provides an Expanded View on the Genetic Architecture of Coronary Artery Disease , 2017, Circulation research.

[16]  Brent S. Pedersen,et al.  Nanopore sequencing and assembly of a human genome with ultra-long reads , 2017, Nature Biotechnology.

[17]  P. Visscher,et al.  Multi-trait analysis of genome-wide association summary statistics using MTAG , 2017, Nature Genetics.

[18]  Gary D Bader,et al.  Association analysis identifies 65 new breast cancer risk loci , 2017, Nature.

[19]  Christian Gieger,et al.  Bayesian and frequentist analysis of an Austrian genome-wide association study of colorectal cancer and advanced adenomas , 2017, Oncotarget.

[20]  M. O’Donovan,et al.  Genome-wide association analysis identifies 30 new susceptibility loci for schizophrenia , 2017, Nature Genetics.

[21]  Shannon M. McNulty,et al.  Human Centromeres Produce Chromosome-Specific and Array-Specific Alpha Satellite Transcripts that Are Complexed with CENP-A and CENP-C. , 2017, Developmental cell.

[22]  R. Durbin,et al.  Evaluation of GRCh38 and de novo haploid genome assemblies demonstrates the enduring quality of the reference assembly , 2016, bioRxiv.

[23]  J. Lawrence,et al.  Demethylated HSATII DNA and HSATII RNA Foci Sequester PRC1 and MeCP2 into Cancer-Specific Nuclear Bodies. , 2017, Cell reports.

[24]  Karen H. Miga,et al.  Human centromeric CENP-A chromatin is a homotypic, octameric nucleosome at all cell cycle points , 2017, The Journal of cell biology.

[25]  A. Berchuck,et al.  Genome-wide association study evaluating single-nucleotide polymorphisms and outcomes in patients with advanced stage serous ovarian or primary peritoneal cancer: An NRG Oncology/Gynecologic Oncology Group study. , 2014, Gynecologic oncology.

[26]  J. Déjardin,et al.  The molecular basis of the organization of repetitive DNA-containing constitutive heterochromatin in mammals , 2017, Chromosome Research.

[27]  M. E. Aldrup-MacDonald,et al.  Genomic variation within alpha satellite DNA influences centromere location on human chromosomes with metastable epialleles , 2016, Genome research.

[28]  Ali Bashir,et al.  Alpha-CENTAURI: assessing novel centromeric repeat sequence variation with long read sequencing , 2016, Bioinform..

[29]  Christoph Lange,et al.  Family-based Association Analyses of Imputed Genotypes Reveal Genome-Wide Significant Association of Alzheimer’s disease with OSBPL6, PTPRG and PDCL3 , 2015, Molecular Psychiatry.

[30]  Ricardo Villamarín-Salomón,et al.  ClinVar: public archive of interpretations of clinically relevant variants , 2015, Nucleic Acids Res..

[31]  Vladimir Vacic,et al.  Genome‐wide association study of schizophrenia in Ashkenazi Jews , 2015, American journal of medical genetics. Part B, Neuropsychiatric genetics : the official publication of the International Society of Psychiatric Genetics.

[32]  B. Wittner,et al.  Pericentromeric satellite repeat expansions through RNA-derived DNA intermediates in cancer , 2015, Proceedings of the National Academy of Sciences.

[33]  Evan E. Eichler,et al.  Genetic variation and the de novo assembly of human genomes , 2015, Nature Reviews Genetics.

[34]  Gabor T. Marth,et al.  A global reference for human genetic variation , 2015, Nature.

[35]  Karen H. Miga,et al.  Completing the human genome: the progress and challenge of satellite DNA assembly , 2015, Chromosome Research.

[36]  D. Ferreira,et al.  Satellite non-coding RNAs: the emerging players in cells, cellular pathways and cancer , 2015, Chromosome Research.

[37]  Judy H. Cho,et al.  Association analyses identify 38 susceptibility loci for inflammatory bowel disease and highlight shared genetic risk across populations , 2015, Nature Genetics.

[38]  Karen H. Miga,et al.  Utilizing mapping targets of sequences underrepresented in the reference assembly to reduce false positive alignments , 2015, Nucleic acids research.

[39]  D. Cleveland,et al.  DNA Sequence-Specific Binding of CENP-B Enhances the Fidelity of Human Centromere Function. , 2015, Developmental cell.

[40]  Karen H. Miga,et al.  Replication of alpha-satellite DNA arrays in endogenous human centromeric regions and in human artificial chromosome , 2014, Nucleic acids research.

[41]  Margaret A. Pericak-Vance,et al.  Genome-Wide Association Meta-analysis of Neuropathologic Features of Alzheimer's Disease and Related Dementias , 2014, PLoS genetics.

[42]  Mauro Maggioni,et al.  Genomic Characterization of Large Heterochromatic Gaps in the Human Genome Assembly , 2014, PLoS Comput. Biol..

[43]  Nicolas Altemose,et al.  Centromere reference models for human chromosomes X and Y satellite arrays , 2013, Genome research.

[44]  A. Ribeiro,et al.  Genome Wide Association Study (GWAS) of Chagas Cardiomyopathy in Trypanosoma cruzi Seropositive Subjects , 2013, PloS one.

[45]  S. Chanock,et al.  A Genome-Wide Association Study Identifies New Susceptibility Loci for Esophageal Adenocarcinoma and Barrett’s Esophagus , 2013, Nature Genetics.

[46]  S. Heath,et al.  Genome-wide association study of multiple congenital heart disease phenotypes identifies a susceptibility locus for atrial septal defect at chromosome 4p16 , 2013, Nature Genetics.

[47]  H. Willard,et al.  Sequences Associated with Centromere Competency in the Human Genome , 2012, Molecular and Cellular Biology.

[48]  E. Gamazon,et al.  Identification of novel germline polymorphisms governing capecitabine sensitivity , 2012, Cancer.

[49]  Kristin A. Maloney,et al.  Functional epialleles at an endogenous human centromere , 2012, Proceedings of the National Academy of Sciences.

[50]  K. Hayden Human centromere genomics: now it's personal , 2012, Chromosome Research.

[51]  A. Iafrate,et al.  Aberrant Overexpression of Satellite Repeats in Pancreatic and Other Epithelial Cancers , 2011, Science.

[52]  Kesheng Wang,et al.  A genome-wide meta-analysis identifies novel loci associated with schizophrenia and bipolar disorder , 2010, Schizophrenia Research.

[53]  F. Collins,et al.  Potential etiologic and functional implications of genome-wide association loci for human diseases and traits , 2009, Proceedings of the National Academy of Sciences.

[54]  Alan F. Scott,et al.  McKusick's Online Mendelian Inheritance in Man (OMIM®) , 2008, Nucleic Acids Res..

[55]  Bjarni V. Halldórsson,et al.  New sequence variants associated with bone mineral density , 2009, Nature Genetics.

[56]  H. Zeyneloglu,et al.  Chromosome heteromorphisms: an impact on infertility , 2008, Journal of Assisted Reproduction and Genetics.

[57]  A. Pierce,et al.  Genomic architecture and inheritance of human ribosomal RNA gene clusters. , 2007, Genome research.

[58]  Timothy B. Stockwell,et al.  The Diploid Genome Sequence of an Individual Human , 2007, PLoS biology.

[59]  R. Donev,et al.  Human chromosome 1 satellite 3 DNA is decondensed, demethylated and transcribed in senescent cells and in A431 epithelial carcinoma cells , 2007, Cytogenetic and Genome Research.

[60]  Tatiana Tatusova,et al.  NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins , 2004, Nucleic Acids Res..

[61]  Sonja W. Scholz,et al.  Genome-wide genotyping in Parkinson's disease and neurologically normal controls: first stage analysis and public release of data , 2006, The Lancet Neurology.

[62]  D. Pathak,et al.  Genomic instability of the DYZ1 repeat in patients with Y chromosome anomalies and males exposed to natural background radiation. , 2006, DNA research : an international journal for rapid publication of reports on genes and genomes.

[63]  David Reich,et al.  A whole-genome admixture scan finds a candidate locus for multiple sclerosis susceptibility , 2005, Nature Genetics.

[64]  Tatiana A. Tatusova,et al.  NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins , 2004, Nucleic Acids Res..

[65]  B. Marçais,et al.  On the mode of evolution of alpha satellite DNA in human populations , 1991, Journal of Molecular Evolution.

[66]  Huntington F. Willard,et al.  Chromosome-specific subsets of human alpha satellite DNA: Analysis of sequence divergence within and between chromosomal subsets and evidence for an ancestral pentameric repeat , 2005, Journal of Molecular Evolution.

[67]  H. Willard,et al.  Analysis of the centromeric regions of the human genome assembly. , 2004, Trends in genetics : TIG.

[68]  A. Prasad,et al.  Organizational variation of DYZ1 repeat sequences on the human Y chromosome and its diagnostic potentials. , 2004, DNA and cell biology.

[69]  D. Haussler,et al.  The structure and evolution of centromeric transition regions within the human genome , 2004, Nature.

[70]  Evan E. Eichler,et al.  An assessment of the sequence gaps: Unfinished business in a finished human genome , 2004, Nature Reviews Genetics.

[71]  Robert C. Edgar,et al.  MUSCLE: multiple sequence alignment with high accuracy and high throughput. , 2004, Nucleic acids research.

[72]  Huntington F. Willard,et al.  Interhomologue sequence variation of alpha satellite DNA from human chromosome 17: Evidence for concerted evolution along haplotypic lineages , 1995, Journal of Molecular Evolution.

[73]  I. Tagarro,et al.  Chromosomal localization of human satellites 2 and 3 by a FISH method using oligonucleotides as probes , 1994, Human Genetics.

[74]  K. Jones,et al.  The chromosomal location of human satellite DNA III , 1973, Chromosoma.

[75]  M. Pagès,et al.  Structural organization and polymorphism of the alpha satellite DNA sequences of chromosomes 13 and 21 as revealed by pulse field gel electrophoresis , 2004, Human Genetics.

[76]  C. Junien,et al.  The organization of two related subfamilies of a human tandemly repeated DNA is chromosome specific , 2004, Human Genetics.

[77]  Terrence S. Furey,et al.  The UCSC Table Browser data retrieval tool , 2004, Nucleic Acids Res..

[78]  Alan F. Scott,et al.  Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders , 2002, Nucleic Acids Res..

[79]  E. Winzeler,et al.  Genomic and Genetic Definition of a Functional Human Centromere , 2001, Science.

[80]  Valery Shepelev,et al.  Alpha-satellite DNA of primates: old and new families , 2001, Chromosoma.

[81]  J. V. Moran,et al.  Initial sequencing and analysis of the human genome. , 2001, Nature.

[82]  A. Fisher,et al.  Targeting of Ikaros to pericentromeric heterochromatin by direct DNA binding. , 2000, Genes & development.

[83]  K. Choo,et al.  Centromere Protein B Null Mice are Mitotically and Meiotically Normal but Have Lower Body and Testis Weights , 1998, Journal of Cell Biology.

[84]  M. Ferguson-Smith,et al.  Human centromeric DNAs , 1997, Human Genetics.

[85]  H. Willard,et al.  Nonrandom localization of recombination events in human alpha satellite repeat unit variants: implications for higher-order structural characteristics within centromeric heterochromatin , 1993, Molecular and cellular biology.

[86]  W. Earnshaw,et al.  Identification of a subdomain of CENP-B that is necessary and sufficient for localization to the human centromere , 1992, The Journal of cell biology.

[87]  H. Willard,et al.  Pulsed-field gel analysis of alpha-satellite DNA at the human X chromosome centromere: high-frequency polymorphisms and array size estimate. , 1990, Genomics.

[88]  C. Tyler-Smith,et al.  Y chromosome DNA haplotyping suggests that most European and Asian men are descended from one of two males. , 1990, Genomics.

[89]  H. Willard,et al.  Long-range organization of tandem arrays of alpha satellite DNA at the centromeres of human chromosomes: high-frequency array-length polymorphism and meiotic stability. , 1989, Proceedings of the National Academy of Sciences of the United States of America.

[90]  H. Willard,et al.  Nucleotide sequence heterogeneity of alpha satellite repetitive DNA: a survey of alphoid sequences from different human chromosomes. , 1987, Nucleic acids research.

[91]  Huntington F. Willard,et al.  Hierarchical order in chromosome-specific human alpha satellite DNA , 1987 .

[92]  D. Mccormick Sequence the Human Genome , 1986, Bio/Technology.

[93]  M. Yamada,et al.  A human Y-chromosome specific repeated DNA family (DYZ1) consists of a tandem array of pentanucleotides. , 1986, Nucleic acids research.

[94]  H. Willard,et al.  Structure, organization, and sequence of alpha satellite DNA from human chromosome 17: evidence for evolution by unequal crossing-over and an ancestral pentamer repeat shared with the human X chromosome , 1986, Molecular and cellular biology.

[95]  M. Frommer,et al.  Sequence relationships of three human satellite DNAs. , 1986, Journal of molecular biology.

[96]  N. B. Atkin,et al.  Chromosome 1 heterochromatin variants and cancer: a reassessment. , 1985, Cancer genetics and cytogenetics.

[97]  H. Olsson,et al.  C-band heteromorphism in breast cancer patients. , 1985, Cancer genetics and cytogenetics.

[98]  Willard Hf Chromosome-specific organization of human alpha satellite DNA. , 1985 .

[99]  H. Willard Chromosome-specific organization of human alpha satellite DNA. , 1985, American journal of human genetics.

[100]  N. B. Atkin,et al.  Heterochromatin polymorphism and human cancer. , 1981, Cancer genetics and cytogenetics.

[101]  J. C. Wu,et al.  Sequence definition and organization of a human repeated DNA. , 1980, Journal of molecular biology.

[102]  L. Manuelidis,et al.  Homology between human and simian repeated DNA , 1978, Nature.

[103]  L. Kunkel,et al.  Analysis of human Y-chromosome-specific reiterated DNA in chromosome variants. , 1977, Proceedings of the National Academy of Sciences of the United States of America.

[104]  H. Cooke Repeated sequence specific to human males , 1976, Nature.

[105]  G. P. Smith,et al.  Evolution of repeated DNA sequences by unequal crossover. , 1976, Science.

[106]  R. P. Clayton,et al.  The location of four human satellite DNAs on human chromosomes. , 1975, Experimental cell research.

[107]  K. Jones,et al.  Location of satellite and homogeneous DNA sequences on human chromosomes. , 1971, Nature: New biology.