Centromere evolution and CpG methylation during vertebrate speciation

Centromeres and large-scale structural variants evolve and contribute to genome diversity during vertebrate speciation. Here, we perform de novo long-read genome assembly of three inbred medaka strains that are derived from geographically isolated subpopulations and undergo speciation. Using single-molecule real-time (SMRT) sequencing, we obtain three chromosome-mapped genomes of length ~734, ~678, and ~744Mbp with a resource of twenty-two centromeric regions of length 20–345kbp. Centromeres are positionally conserved among the three strains and even between four pairs of chromosomes that were duplicated by the teleost-specific whole-genome duplication 320–350 million years ago. The centromeres do not all evolve at a similar pace; rather, centromeric monomers in non-acrocentric chromosomes evolve significantly faster than those in acrocentric chromosomes. Using methylation sensitive SMRT reads, we uncover centromeres are mostly hypermethylated but have hypomethylated sub-regions that acquire unique sequence compositions independently. These findings reveal the potential of non-acrocentric centromere evolution to contribute to speciation.Centromeres and large-scale structural variants evolve and contribute to genome diversity during vertebrate speciation. Here Ichikawa et al perform de novo long-read genome assembly of three inbred medaka strains, and report long-range structure of centromeres and their methylation as well as correlation of structural variants with differential gene expression.

[1]  Eugene W. Myers,et al.  The fragment assembly string graph , 2005, ECCB/JBI.

[2]  Jianzhi Zhang Evolution of DMY, a newly emergent male sex-determination gene of medaka fish. , 2004, Genetics.

[3]  Mark Yandell,et al.  MAKER2: an annotation pipeline and genome-database management tool for second-generation genome projects , 2011, BMC Bioinformatics.

[4]  Taro L. Saito,et al.  Genome-wide genetic variations are highly correlated with proximal DNA methylation patterns , 2012, Genome research.

[5]  I. Cheeseman,et al.  The molecular basis for centromere identity and function , 2015, Nature Reviews Molecular Cell Biology.

[6]  A. Shimada,et al.  Occurrence of a short variant of the Tol2 transposable element in natural populations of the medaka fish. , 2011, Genetics research.

[7]  F. Chen Effects of A:T base pairs on the B-Z conformational transitions of DNA. , 1988, Nucleic acids research.

[8]  N. Ogonuki,et al.  Centromeric DNA hypomethylation as an epigenetic signature discriminates between germ and somatic cell lineages. , 2007, Developmental biology.

[9]  Sofia M. C. Robb,et al.  MAKER: an easy-to-use annotation pipeline designed for emerging model organism genomes. , 2007, Genome research.

[10]  Michael A. Beer,et al.  Discriminative prediction of mammalian enhancers from DNA sequence. , 2011, Genome research.

[11]  H. Ohashi,et al.  Sequence-specific microscopic visualization of DNA methylation status at satellite repeats in individual cell nuclei and chromosomes , 2013, Nucleic acids research.

[12]  Felix Krueger,et al.  Bismark: a flexible aligner and methylation caller for Bisulfite-Seq applications , 2011, Bioinform..

[13]  J. Landolin,et al.  Assembling large genomes with single-molecule sequencing and locality-sensitive hashing , 2014, Nature Biotechnology.

[14]  Jeffrey Ross-Ibarra,et al.  Comparative analysis of tandem repeats from hundreds of species reveals unique insights into centromere evolution , 2012, Genome Biology.

[15]  Kazuki Ichikawa,et al.  Landscape of CpG methylation of individual repetitive elements , 2015 .

[16]  Haibao Tang,et al.  Single-molecule sequencing of the desiccation-tolerant grass Oropetium thomaeum , 2015, Nature.

[17]  Minoru Tanaka,et al.  Medaka : a model for organogenesis, human disease, and evolution , 2011 .

[18]  Dmitry Pushkarev,et al.  Whole-genome haplotyping using long reads and statistical methods , 2014, Nature Biotechnology.

[19]  G. P. Smith,et al.  Evolution of repeated DNA sequences by unequal crossover. , 1976, Science.

[20]  Jiming Jiang,et al.  Distinct DNA methylation patterns associated with active and inactive centromeres of the maize B chromosome. , 2011, Genome research.

[21]  Gabor T. Marth,et al.  An integrated map of structural variation in 2,504 human genomes , 2015, Nature.

[22]  Bradley P. Coe,et al.  Genome structural variation discovery and genotyping , 2011, Nature Reviews Genetics.

[23]  Eugene W. Myers,et al.  Efficient Local Alignment Discovery amongst Noisy Long Reads , 2014, WABI.

[24]  Y. Bessho,et al.  Transposable element in fish , 1996, Nature.

[25]  Stephen C. J. Parker,et al.  DNA sequence and analysis of human chromosome 8 , 2006, Nature.

[26]  M. Borodovsky,et al.  Gene identification in novel eukaryotic genomes by self-training algorithm , 2005, Nucleic acids research.

[27]  Jonas Korlach,et al.  Selective aluminum passivation for targeted immobilization of single DNA polymerase molecules in zero-mode waveguide nanostructures , 2008, Proceedings of the National Academy of Sciences.

[28]  Fumiko Ohta,et al.  The medaka draft genome and insights into vertebrate genome evolution , 2007, Nature.

[29]  Ulrich Bodenhofer,et al.  KeBABS: an R package for kernel-based analysis of biological sequences , 2015, Bioinform..

[30]  Ian Korf,et al.  Gene finding in novel genomes , 2004, BMC Bioinformatics.

[31]  Shuichi Asakawa,et al.  DMY is a Y-specific DM-domain gene required for male development in the medaka fish , 2002, Nature.

[32]  S. Turner,et al.  Real-time DNA sequencing from single polymerase molecules. , 2010, Methods in enzymology.

[33]  I. Dunham,et al.  DNA sequence and analysis of human chromosome 9 , 2003, Nature.

[34]  S. Neuhauss,et al.  Whole-genome duplication in teleost fishes and its evolutionary consequences , 2014, Molecular Genetics and Genomics.

[35]  Aaron A. Klammer,et al.  Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data , 2013, Nature Methods.

[36]  Ali Bashir,et al.  Alpha-CENTAURI: assessing novel centromeric repeat sequence variation with long read sequencing , 2016, Bioinform..

[37]  S. Turner,et al.  Zero-Mode Waveguides for Single-Molecule Analysis at High Concentrations , 2003, Science.

[38]  Christina A. Cuomo,et al.  Pilon: An Integrated Tool for Comprehensive Microbial Variant Detection and Genome Assembly Improvement , 2014, PloS one.

[39]  M. Schartl,et al.  Genomic organization of the sex-determining and adjacent regions of the sex chromosomes of medaka. , 2006, Genome research.

[40]  Ewan Birney,et al.  Genomic and Phenotypic Characterization of a Wild Medaka Population: Towards the Establishment of an Isogenic Population Genetic Resource in Fish , 2014, G3: Genes, Genomes, Genetics.

[41]  W. Huber,et al.  Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2 , 2014, Genome Biology.

[42]  Mark J. P. Chaisson,et al.  Resolving the complexity of the human genome using single-molecule sequencing , 2014, Nature.

[43]  N. Saitou,et al.  The neighbor-joining method: a new method for reconstructing phylogenetic trees. , 1987, Molecular biology and evolution.

[44]  Mihai Pop,et al.  DNACLUST: accurate and efficient clustering of phylogenetic marker genes , 2011, BMC Bioinformatics.

[45]  N. Nagai,et al.  Geographic Variation and Diversity of the Cytochrome b Gene in Japanese Wild Populations of Medaka, Oryzias latipes , 2003, Zoological science.

[46]  David L. Steffen,et al.  The DNA sequence of the human X chromosome , 2005, Nature.

[47]  David Haussler,et al.  Long-read sequence assembly of the gorilla genome , 2016, Science.

[48]  N. Friedman,et al.  Trinity: reconstructing a full-length transcriptome without a genome from RNA-Seq data , 2011, Nature Biotechnology.

[49]  Alan Christoffels,et al.  Chromosomal-Level Assembly of the Asian Seabass Genome Using Long Sequence Reads and Multi-layered Scaffolding , 2016, PLoS genetics.

[50]  Huntington F. Willard,et al.  Hierarchical order in chromosome-specific human alpha satellite DNA , 1987 .

[51]  D. Chalopin,et al.  Evolutionary impact of transposable elements on genomic diversity and lineage-specific innovation in vertebrates , 2015, Chromosome Research.

[52]  H. Inoko,et al.  Genetic linkage map of medaka with polymerase chain reaction length polymorphisms. , 2005, Gene.

[53]  S. Henikoff,et al.  The Centromere Paradox: Stable Inheritance with Rapidly Evolving DNA , 2001, Science.

[54]  S. V. van Heeringen,et al.  Principles of nucleation of H3K27 methylation during embryonic development , 2014, Genome research.

[55]  S. Sugano,et al.  Associations between nucleosome phasing, sequence asymmetry, and tissue-specific expression in a set of inbred Medaka species , 2015, BMC Genomics.

[56]  Keith Bradnam,et al.  CEGMA: a pipeline to accurately annotate core genes in eukaryotic genomes , 2007, Bioinform..

[57]  Hiroyuki Takeda,et al.  The art of medaka genetics and genomics: what makes them so unique? , 2010, Annual review of genetics.

[58]  A. Shimada,et al.  Evidence for recent invasion of the medaka fish genome by the Tol2 transposable element. , 2000, Genetics.

[59]  C. White,et al.  Centromere Associations in Meiotic Chromosome Pairing. , 2015, Annual review of genetics.

[60]  Russell E. Durrett,et al.  Assembly and diploid architecture of an individual human genome via single-molecule technologies , 2015, Nature Methods.

[61]  Y. Kuwahara,et al.  γ‐Ray exposure accelerates spermatogenesis of medaka fish, Oryzias latipes , 2003 .

[62]  F. Chen,et al.  Genome-wide mapping of cytosine methylation revealed dynamic DNA methylation patterns associated with genes and centromeres in rice. , 2010, The Plant journal : for cell and molecular biology.

[63]  T Aida,et al.  On the Inheritance of Color in a Fresh-Water Fish, APLOCHEILUS LATIPES Temmick and Schlegel, with Special Reference to Sex-Linked Inheritance. , 1921, Genetics.

[64]  Eleazar Eskin,et al.  The Spectrum Kernel: A String Kernel for SVM Protein Classification , 2001, Pacific Symposium on Biocomputing.

[65]  E. Winzeler,et al.  Genomic and Genetic Definition of a Functional Human Centromere , 2001, Science.

[66]  G. Karpen,et al.  Epigenetic regulation of centromeric chromatin: old dogs, new tricks? , 2008, Nature Reviews Genetics.

[67]  Taro L. Saito,et al.  Large hypomethylated domains serve as strong repressive machinery for key developmental genes in vertebrates , 2014, Development.

[68]  C. Feschotte Transposable elements and the evolution of regulatory networks , 2008, Nature Reviews Genetics.

[69]  M. Schatz,et al.  Hybrid error correction and de novo assembly of single-molecule sequencing reads , 2012, Nature Biotechnology.

[70]  J. Inoue,et al.  Divergence time of the two regional medaka populations in Japan as a new time scale for comparative genomics of vertebrates , 2009, Biology Letters.

[71]  Nicolas Altemose,et al.  Centromere reference models for human chromosomes X and Y satellite arrays , 2013, Genome research.

[72]  I. Longden,et al.  EMBOSS: the European Molecular Biology Open Software Suite. , 2000, Trends in genetics : TIG.

[73]  Valery Shepelev,et al.  Alpha-satellite DNA of primates: old and new families , 2001, Chromosoma.