A draft genome assembly of the Chinese sillago (Sillago sinica), the first reference genome for Sillaginidae fishes

Abstract Background Sillaginidae, also known as smelt-whitings, is a family of benthic coastal marine fishes in the Indo-West Pacific that have high ecological and economic importance. Many Sillaginidae species, including the Chinese sillago (Sillago sinica), have been recently described in China, providing valuable material to analyze genetic diversification of the family Sillaginidae. Here, we constructed a reference genome for the Chinese sillago, with the aim to set up a platform for comparative analysis of all species in this family. Findings Using the single-molecule real-time DNA sequencing platform Pacific Biosciences (PacBio) Sequel, we generated ∼27.3 Gb genomic DNA sequences for the Chinese sillago. We reconstructed a genome assembly of 534 Mb using a strategy that takes advantage of complementary strengths of two genome assembly programs, Canu and FALCON. The genome size was consistent with the estimated genome size based on k-mer analysis. The assembled genome consisted of 802 contigs with a contig N50 length of 2.6 Mb. We annotated 22,122 protein-coding genes in the Chinese sillago genomes using a de novo method as well as RNA sequencing data and homologies to other teleosts. According to the phylogenetic analysis using protein-coding genes, the Chinese sillago is closely related to Larimichthys crocea and Dicentrarchus labrax and diverged from their ancestor around 69.5–82.6 million years ago. Conclusions Using long reads generated with PacBio sequencing technology, we have built a draft genome assembly for the Chinese sillago, which is the first reference genome for Sillaginidae species. This genome assembly sets a stage for comparative analysis of the diversification and adaptation of fishes in Sillaginidae.

[1]  Lior Pachter,et al.  Sequence Analysis , 2020, Definitions.

[2]  Toni Gabaldón,et al.  Redundans: an assembly pipeline for highly heterozygous genomes , 2015, Nucleic acids research.

[3]  Sudhir Kumar,et al.  Tree of Life Reveals Clock-Like Speciation and Diversification , 2014, Molecular biology and evolution.

[4]  S. Eddy,et al.  tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. , 1997, Nucleic acids research.

[5]  Yu Tian,et al.  Long-read sequence assembly of the firefly Pyrocoelia pectoralis genome , 2017, GigaScience.

[6]  Daniel R. Zerbino,et al.  Ensembl 2014 , 2013, Nucleic Acids Res..

[7]  Qiang Li,et al.  Genome sequence and genetic diversity of the common carp, Cyprinus carpio , 2014, Nature Genetics.

[8]  Burkhard Morgenstern,et al.  AUGUSTUS: ab initio prediction of alternative transcripts , 2006, Nucleic Acids Res..

[9]  Juan Miguel García-Gómez,et al.  Blast2GO: a universal tool for annotation, visualization and analysis in functional genomics research , 2005, Bioinform..

[10]  R. Lande,et al.  Adaptation, Plasticity, and Extinction in a Changing Environment: Towards a Predictive Theory , 2010, PLoS biology.

[11]  Nazia Qamar,et al.  A new Sillago species (family Sillaginidae) with descriptions of six sillaginids from the northern Arabian Sea , 2018, Marine Biodiversity.

[12]  Evgeny M. Zdobnov,et al.  BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs , 2015, Bioinform..

[13]  M. C. Ferrari,et al.  Evolution and behavioural responses to human-induced rapid environmental change , 2011, Evolutionary applications.

[14]  O. Kohany,et al.  Repbase Update, a database of repetitive elements in eukaryotic genomes , 2015, Mobile DNA.

[15]  M. Schatz,et al.  Phased diploid genome assembly with single-molecule real-time sequencing , 2016, Nature Methods.

[16]  Julie D Thompson,et al.  Multiple Sequence Alignment Using ClustalW and ClustalX , 2003, Current protocols in bioinformatics.

[17]  Ensembl , 2020, Definitions.

[18]  W. Cresko,et al.  Evolution of stickleback in 50 years on earthquake-uplifted islands , 2015, Proceedings of the National Academy of Sciences.

[19]  S. Koren,et al.  Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation , 2016, bioRxiv.

[20]  Susumu Goto,et al.  KEGG: Kyoto Encyclopedia of Genes and Genomes , 2000, Nucleic Acids Res..

[21]  Qiong Luo,et al.  The draft genome of the grass carp (Ctenopharyngodon idellus) provides insights into its evolution and vegetarian adaptation , 2015, Nature Genetics.

[22]  Guojie Zhang,et al.  Whole-genome sequence of a flatfish provides insights into ZW sex chromosome evolution and adaptation to a benthic lifestyle , 2014, Nature Genetics.

[23]  R. Durbin,et al.  GeneWise and Genomewise. , 2004, Genome research.

[24]  K. Nakaya,et al.  A new sand whiting, Sillago (Sillago) caudicula, from Oman, the Indian Ocean (Perciformes: Sillaginidae) , 2010, Ichthyological Research.

[25]  T. Kocher,et al.  A high quality assembly of the Nile Tilapia (Oreochromis niloticus) genome reveals the structure of two sex determination regions , 2017, BMC Genomics.

[26]  Koichiro Doi,et al.  Centromere evolution and CpG methylation during vertebrate speciation , 2017, Nature Communications.

[27]  M. Yandell,et al.  Genome Annotation and Curation Using MAKER and MAKER‐P , 2014, Current protocols in bioinformatics.

[28]  Huanming Yang,et al.  The Sinocyclocheilus cavefish genome provides insights into cave adaptation , 2016, BMC Biology.

[29]  Sean R. Eddy,et al.  Infernal 1.1: 100-fold faster RNA homology searches , 2013, Bioinform..

[30]  Ziheng Yang,et al.  Bayesian estimation of species divergence times under a molecular clock using multiple fossil calibrations with soft bounds. , 2006, Molecular biology and evolution.

[31]  Yang Lei,et al.  Genome puzzle master (GPM): an integrated pipeline for building and editing pseudomolecules from fragmented sequences , 2016, Bioinform..

[32]  Sean R. Eddy,et al.  Rfam: an RNA family database , 2003, Nucleic Acids Res..

[33]  Jonathan Pevsner,et al.  Basic Local Alignment Search Tool (BLAST) , 2005 .

[34]  T. Gao,et al.  Description and DNA Barcoding of a New Sillago Species, Sillago sinica (Perciformes: Sillaginidae), from Coastal Waters of China , 2011 .

[35]  T. Sicheritz-Pontén,et al.  Comparative performance of the BGISEQ-500 vs Illumina HiSeq2500 sequencing platforms for palaeogenomic sequencing , 2017, GigaScience.

[36]  S. Harzsch,et al.  Notes on the Foraging Strategies of the Giant Robber Crab Birgus latro (Anomala) on Christmas Island: Evidence for Active Predation on Red Crabs Gecarcoidea natalis (Brachyura). , 2016, Zoological studies.

[37]  G. Benson,et al.  Tandem repeats finder: a program to analyze DNA sequences. , 1999, Nucleic acids research.

[38]  Hiroyuki Ogata,et al.  KEGG: Kyoto Encyclopedia of Genes and Genomes , 1999, Nucleic Acids Res..

[39]  Keith Bradnam,et al.  CEGMA: a pipeline to accurately annotate core genes in eukaryotic genomes , 2007, Bioinform..

[40]  Tetsuya Hayashi,et al.  Efficient de novo assembly of highly heterozygous genomes from whole-genome shotgun short reads , 2014, Genome research.

[41]  Nansheng Chen,et al.  Using RepeatMasker to Identify Repetitive Elements in Genomic Sequences , 2009, Current protocols in bioinformatics.

[42]  A. Whitehead,et al.  The genomic landscape of rapid repeated evolutionary adaptation to toxic pollution in wild fish , 2016, Science.

[43]  R. Agarwala,et al.  Composition-based statistics and translated nucleotide searches: Improving the TBLASTN module of BLAST , 2006, BMC Biology.

[44]  Zhiqiang Han,et al.  Description and DNA Barcoding of a New Sillago Species, Sillago shaoi (Perciformes: Sillaginidae), in the Taiwan Strait. , 2016, Zoological studies.

[45]  Chon-Kit Kenneth Chan,et al.  Analysis of RNA-Seq Data Using TopHat and Cufflinks. , 2016, Methods in molecular biology.

[46]  Jianying Yuan,et al.  Estimation of genomic characteristics by analyzing k-mer frequency in de novo genome projects , 2013, 1308.2012.

[47]  Terry. Grande,et al.  Fishes of the World: Nelson/Fishes of the World , 2016 .

[48]  R. Mckay Sillaginid fishes of the world (family Sillaginidae) : an annotated and illustrated catalogue of the sillago, smelt, or Indo-Pacific whiting species known to date , 1992 .

[49]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[50]  Nansheng Chen,et al.  Using RepeatMasker to Identify Repetitive Elements in Genomic Sequences , 2009, Current protocols in bioinformatics.

[51]  B Marshall,et al.  Gene Ontology Consortium: The Gene Ontology (GO) database and informatics resource , 2004, Nucleic Acids Res..

[52]  J. S. Nelson,et al.  Fishes of the world. , 1978 .

[53]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.