Genomic variations and epigenomic landscape of the Medaka Inbred Kiyosu-Karlsruhe (MIKK) panel

Background The teleost medaka ( Oryzias latipes ) is a well-established vertebrate model system, with a long history of genetic research, and multiple high-quality reference genomes available for several inbred strains. Medaka has a high tolerance to inbreeding from the wild, thus allowing one to establish inbred lines from wild founder individuals. Results We exploit this feature to create an inbred panel resource: the Medaka Inbred Kiyosu-Karlsruhe (MIKK) panel. This panel of 80 near-isogenic inbred lines contains a large amount of genetic variation inherited from the original wild population. We use Oxford Nanopore Technologies (ONT) long read data to further investigate the genomic and epigenomic landscapes of a subset of the MIKK panel. Nanopore sequencing allows us to identify a large variety of high-quality structural variants, and we present results and methods using a pan-genome graph representation of 12 individual medaka lines. This graph-based reference MIKK panel genome reveals novel differences between the MIKK panel lines and standard linear reference genomes. We find additional MIKK panel-specific genomic content that would be missing from linear reference alignment approaches. We are also able to identify and quantify the presence of repeat elements in each of the lines. Finally, we investigate line-specific CpG methylation and performed differential DNA methylation analysis across these 12 lines. Conclusions We present a detailed analysis of the MIKK panel genomes using long and short read sequence technologies, creating a MIKK panel-specific pan genome reference dataset allowing for investigation of novel variation types that would be elusive using standard approaches.

[1]  Z. Iqbal,et al.  Gramtools enables multiscale variation analysis with genome graphs , 2021, Genome biology.

[2]  Omar T. Hammouda,et al.  The Medaka Inbred Kiyosu-Karlsruhe (MIKK) panel , 2021, bioRxiv.

[3]  Kevin Ushey,et al.  Project Environments [R package renv version 0.13.2] , 2021 .

[4]  S. Garg Computational methods for chromosome-scale haplotype reconstruction , 2021, Genome Biology.

[5]  Y. Shen,et al.  Long-read sequencing and de novo genome assembly of marine medaka (Oryzias melastigma) , 2020, BMC Genomics.

[6]  S. Koren,et al.  Merqury: reference-free quality, completeness, and phasing assessment for genome assemblies , 2020, Genome Biology.

[7]  W. Chow,et al.  Significantly improving the quality of genome assemblies through curation , 2020, bioRxiv.

[8]  Benedict Paten,et al.  Efficient dynamic variation graphs , 2020, Bioinform..

[9]  Evan E. Eichler,et al.  Long-read human genome sequencing and its applications , 2020, Nature Reviews Genetics.

[10]  Eric S. Lander,et al.  Mapping and characterization of structural variation in 17,795 human genomes , 2020, Nature.

[11]  Jordan M. Eizenga,et al.  Pangenome Graphs. , 2020, Annual review of genomics and human genetics.

[12]  Sergey Koren,et al.  Towards complete and error-free genome assemblies of all vertebrate species , 2020, Nature.

[13]  Jennifer Bryan,et al.  Access Google Sheets using the Sheets API V4 [R package googlesheets4 version 0.2.0] , 2020 .

[14]  S. Brandl,et al.  Color Palettes Based on Fish Species [R package fishualize version 0.2.0] , 2020 .

[15]  Chong Chu,et al.  The design and construction of reference pangenome graphs with minigraph , 2020, Genome Biology.

[16]  Andrew G. Clark,et al.  RepeatModeler2: automated genomic discovery of transposable element families , 2019, bioRxiv.

[17]  Kohske Takahashi,et al.  Welcome to the Tidyverse , 2019, J. Open Source Softw..

[18]  Christophe Dessimoz,et al.  Structural variant calling: the long and the short of it , 2019, Genome Biology.

[19]  Ryan E. Mills,et al.  Structural variation in the sequencing era , 2019, Nature Reviews Genetics.

[20]  Feng Luo,et al.  DeepSignal: detecting DNA methylation state from Nanopore sequencing reads using deep-learning. , 2019, Bioinformatics.

[21]  M. Kinoshita,et al.  Medaka , 2019 .

[22]  Ryan L. Collins,et al.  Functional annotation of rare structural variation in the human brain , 2019, Nature Communications.

[23]  Y. Kamatani,et al.  Comprehensive evaluation of structural variation detection algorithms for whole genome sequencing , 2019, Genome Biology.

[24]  Glenn Hickey,et al.  Genotyping structural variants in pangenome graphs using the vg toolkit , 2019, Genome Biology.

[25]  Tommaso Leonardi,et al.  pycoQC, interactive quality control for Oxford Nanopore Sequencing , 2019, J. Open Source Softw..

[26]  Heng Li,et al.  Fast and accurate long-read assembly with wtdbg2 , 2019, Nature Methods.

[27]  G. Bourque,et al.  Personalized and graph genomes reveal missing signal in epigenomic data , 2020, Genome Biology.

[28]  Peter F. Stadler,et al.  Coordinate systems for supergenomes , 2018, Algorithms for Molecular Biology.

[29]  William Jones,et al.  Variation graph toolkit improves read mapping by representing genetic variation in the reference , 2018, Nature Biotechnology.

[30]  D. English,et al.  Heritable DNA methylation marks associated with susceptibility to breast cancer , 2018, Nature Communications.

[31]  Shujun Ou,et al.  LTR_retriever: A Highly Accurate and Sensitive Program for Identification of Long Terminal Repeat Retrotransposons1[OPEN] , 2017, Plant Physiology.

[32]  Wan-Ping Lee,et al.  Fast and accurate genomic analyses using genome graphs , 2019, Nature Genetics.

[33]  S. Morishita,et al.  Complete fusion of a transposon and herpesvirus created the Teratorn mobile element in medaka fish , 2017, Nature Communications.

[34]  Michael C. Schatz,et al.  Accurate detection of complex structural variations using single molecule sequencing , 2017, Nature Methods.

[35]  Bernat Gel,et al.  karyoploteR: an R/Bioconductor package to plot customizable genomes displaying arbitrary data , 2017, bioRxiv.

[36]  Brent S. Pedersen,et al.  Nanopore sequencing and assembly of a human genome with ultra-long reads , 2017, Nature Biotechnology.

[37]  Jordan M. Eizenga,et al.  Genome graphs and the evolution of genome inference , 2017, bioRxiv.

[38]  Winston Timp,et al.  Detecting DNA cytosine methylation using nanopore sequencing , 2017, Nature Methods.

[39]  K. Jakobsen,et al.  Descriptor : Whole genome sequencing data and de novo draft assemblies for 66 teleost species , 2017 .

[40]  C. K. Chan,et al.  The pangenome of an agronomically important crop plant Brassica oleracea , 2016, Nature Communications.

[41]  Jin-Wu Nam,et al.  The present and future of de novo whole-genome assembly , 2016, Briefings Bioinform..

[42]  Simon Garnier,et al.  Default Color Maps from 'matplotlib' , 2015 .

[43]  C. Wilke Streamlined Plot Theme and Plot Annotations for 'ggplot2' , 2015 .

[44]  Heng Li,et al.  Minimap and miniasm: fast mapping and de novo assembly for noisy long sequences , 2015, Bioinform..

[45]  James Y. Zou Analysis of protein-coding genetic variation in 60,706 humans , 2015, Nature.

[46]  Evgeny M. Zdobnov,et al.  BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs , 2015, Bioinform..

[47]  Justin Zobel,et al.  Bandage: interactive visualization of de novo genome assemblies , 2015, bioRxiv.

[48]  S. Kirchmaier,et al.  The Genomic and Genetic Toolbox of the Teleost Medaka (Oryzias latipes) , 2015, Genetics.

[49]  Floriane Plard,et al.  Comparative Analysis of Transposable Elements Highlights Mobilome Diversity and Evolution in Vertebrates , 2015, Genome biology and evolution.

[50]  Christina A. Cuomo,et al.  Pilon: An Integrated Tool for Comprehensive Microbial Variant Detection and Genome Assembly Improvement , 2014, PloS one.

[51]  Roland Eils,et al.  circlize implements and enhances circular visualization in R , 2014, Bioinform..

[52]  F. van Nieuwerburgh,et al.  Library construction for next-generation sequencing: overviews and challenges. , 2014, BioTechniques.

[53]  Ewan Birney,et al.  Genomic and Phenotypic Characterization of a Wild Medaka Population: Towards the Establishment of an Isogenic Population Genetic Resource in Fish , 2014, G3: Genes, Genomes, Genetics.

[54]  Robert Gentleman,et al.  Software for Computing and Annotating Genomic Ranges , 2013, PLoS Comput. Biol..

[55]  Alexey A. Gurevich,et al.  QUAST: quality assessment tool for genome assemblies , 2013, Bioinform..

[56]  Sven Rahmann,et al.  Snakemake--a scalable bioinformatics workflow engine. , 2012, Bioinformatics.

[57]  Helga Thorvaldsdóttir,et al.  Integrative Genomics Viewer , 2011, Nature Biotechnology.

[58]  E. Birney,et al.  Mapping identifiers for the integration of genomic datasets with the R/Bioconductor package biomaRt , 2009, Nature Protocols.

[59]  C. Pipper,et al.  [''R"--project for statistical computing]. , 2008, Ugeskrift for laeger.

[60]  Stefan Kurtz,et al.  LTRharvest, an efficient and flexible software for de novo detection of LTR retrotransposons , 2008, BMC Bioinformatics.

[61]  M. Pack,et al.  Transcription factor onecut3 regulates intrahepatic biliary development in zebrafish , 2008, Developmental dynamics : an official publication of the American Association of Anatomists.

[62]  Nobuyoshi Shimizu,et al.  UTGB/medaka: genomic resource database for medaka biology , 2007, Nucleic Acids Res..

[63]  Fumiko Ohta,et al.  The medaka draft genome and insights into vertebrate genome evolution , 2007, Nature.

[64]  D. Conrad,et al.  Global variation in copy number in the human genome , 2006, Nature.

[65]  B. Moor,et al.  BioMart and Bioconductor: a powerful link between biological databases and microarray data analysis , 2005 .

[66]  S. Salzberg,et al.  FLASH: fast length adjustment of short reads to improve genome assemblies , 2011, Bioinform..

[67]  Minoru Tanaka,et al.  Medaka : a model for organogenesis, human disease, and evolution , 2011 .

[68]  Hilde van der Togt,et al.  Publisher's Note , 2003, J. Netw. Comput. Appl..

[69]  M. Schartl,et al.  Medaka — a model organism from the far east , 2002, Nature Reviews Genetics.

[70]  T Aida,et al.  On the Inheritance of Color in a Fresh-Water Fish, APLOCHEILUS LATIPES Temmick and Schlegel, with Special Reference to Sex-Linked Inheritance. , 1921, Genetics.