Chromosomal-Level Assembly of the Asian Seabass Genome Using Long Sequence Reads and Multi-layered Scaffolding

We report here the ~670 Mb genome assembly of the Asian seabass (Lates calcarifer), a tropical marine teleost. We used long-read sequencing augmented by transcriptomics, optical and genetic mapping along with shared synteny from closely related fish species to derive a chromosome-level assembly with a contig N50 size over 1 Mb and scaffold N50 size over 25 Mb that span ~90% of the genome. The population structure of L. calcarifer species complex was analyzed by re-sequencing 61 individuals representing various regions across the species’ native range. SNP analyses identified high levels of genetic diversity and confirmed earlier indications of a population stratification comprising three clades with signs of admixture apparent in the South-East Asian population. The quality of the Asian seabass genome assembly far exceeds that of any other fish species, and will serve as a new standard for fish genomics.

[1]  J. Flowers,et al.  Origins and geographic diversification of African rice (Oryza glaberrima) , 2018, bioRxiv.

[2]  Daniel R. Zerbino,et al.  Ensembl 2016 , 2015, Nucleic Acids Res..

[3]  Minoru Kanehisa,et al.  KEGG as a reference resource for gene and protein annotation , 2015, Nucleic Acids Res..

[4]  Ekaterina Starostina,et al.  Cookiecutter: a tool for kmer-based read filtering and extraction , 2015, bioRxiv.

[5]  S. Lok,et al.  Transcriptome Survey of a Marine Food Fish: Asian Seabass (Lates calcarifer) , 2015 .

[6]  Carlos G Schrago,et al.  Long-Read Single Molecule Sequencing to Resolve Tandem Gene Copies: The Mst77Y Region on the Drosophila melanogaster Y Chromosome , 2015, G3: Genes, Genomes, Genetics.

[7]  W. Fraser,et al.  An improved protocol for small RNA library construction using High Definition adapters , 2015 .

[8]  B. Koop,et al.  A comprehensive analysis of teleost MHC class I sequences , 2015, BMC Evolutionary Biology.

[9]  S. O’Brien,et al.  The Genome 10K Project: a way forward. , 2015, Annual review of animal biosciences.

[10]  María Martín,et al.  UniProt: A hub for protein information , 2015 .

[11]  R. Reinhardt,et al.  European sea bass genome and its variation provide insights into adaptation to euryhalinity and speciation , 2014, Nature Communications.

[12]  A. Graphodatsky,et al.  Genes on B chromosomes of vertebrates , 2014, Molecular Cytogenetics.

[13]  Huaiyu Mi,et al.  The InterPro protein families database: the classification resource after 15 years , 2014, Nucleic Acids Res..

[14]  Juancarlos Chan,et al.  Gene Ontology Consortium: going forward , 2014, Nucleic Acids Res..

[15]  R. Goto,et al.  Early Depletion of Primordial Germ Cells in Zebrafish Promotes Testis Formation , 2014, Stem cell reports.

[16]  Mark J. P. Chaisson,et al.  Resolving the complexity of the human genome using single-molecule sequencing , 2014, Nature.

[17]  The Uniprot Consortium,et al.  UniProt: a hub for protein information , 2014, Nucleic Acids Res..

[18]  Eric S. Lander,et al.  The genomic substrate for adaptive radiation in African cichlid fish , 2014, Nature.

[19]  V. Scaria,et al.  Barcoding of Asian seabass across its geographic range provides evidence for its bifurcation into two distinct species , 2014, Front. Mar. Sci..

[20]  C. Alkan,et al.  Annotated features of domestic cat – Felis catus genome , 2014, GigaScience.

[21]  T. Kocher,et al.  Origin and evolution of B chromosomes in the cichlid fish Astatotilapia latifasciata based on integrated genomic analyses. , 2014, Molecular biology and evolution.

[22]  J. Saju,et al.  Primary analysis of repeat elements of the Asian seabass (Lates calcarifer) transcriptome and genome , 2014, Front. Genet..

[23]  Brian J. Raney,et al.  Ragout—a reference-assisted assembly tool for bacterial genomes , 2014, Bioinform..

[24]  Tetsuya Hayashi,et al.  Efficient de novo assembly of highly heterozygous genomes from whole-genome shotgun short reads , 2014, Genome research.

[25]  D. Chalopin,et al.  The rainbow trout genome provides novel insights into evolution after whole-genome duplication in vertebrates , 2014, Nature Communications.

[26]  Guojie Zhang,et al.  Whole-genome sequence of a flatfish provides insights into ZW sex chromosome evolution and adaptation to a benthic lifestyle , 2014, Nature Genetics.

[27]  A. Christoffels,et al.  Gonad Differentiation in Zebrafish Is Regulated by the Canonical Wnt Signaling Pathway1 , 2014, Biology of reproduction.

[28]  Alexandros Stamatakis,et al.  RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies , 2014, Bioinform..

[29]  W. C. Liew,et al.  Small-scale transcriptomics reveals differences among gonadal stages in Asian seabass (Lates calcarifer) , 2014, Reproductive Biology and Endocrinology.

[30]  Ana Kozomara,et al.  miRBase: annotating high confidence microRNAs using deep sequencing data , 2013, Nucleic Acids Res..

[31]  W. C. Liew,et al.  Zebrafish sex: a complicated affair , 2013, Briefings in functional genomics.

[32]  Jiongtang Li,et al.  L_RNA_scaffolder: scaffolding genomes with transcripts , 2013, BMC Genomics.

[33]  Mauricio O. Carneiro,et al.  The advantages of SMRT sequencing , 2013, Genome Biology.

[34]  Katharina J. Hoff,et al.  WebAUGUSTUS—a web service for training AUGUSTUS and predicting genes in eukaryotes , 2013, Nucleic Acids Res..

[35]  Aaron A. Klammer,et al.  Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data , 2013, Nature Methods.

[36]  J. Derisi,et al.  PRICE: Software for the Targeted Assembly of Components of (Meta) Genomic Sequence Data , 2013, G3: Genes, Genomes, Genetics.

[37]  Anton J. Enright,et al.  The zebrafish reference genome sequence and its relationship to the human genome , 2013, Nature.

[38]  Alexey A. Gurevich,et al.  QUAST: quality assessment tool for genome assemblies , 2013, Bioinform..

[39]  Angel Amores,et al.  The genome of the platyfish, Xiphophorus maculatus, provides insights into evolutionary adaptation and several complex traits , 2013, Nature Genetics.

[40]  David W. Cheung,et al.  SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler , 2012, GigaScience.

[41]  David Levine,et al.  A high-performance computing toolset for relatedness and principal component analysis of SNP data , 2012, Bioinform..

[42]  N. McKenna,et al.  Activation of NF-κB Protein Prevents the Transition from Juvenile Ovary to Testis and Promotes Ovarian Development in Zebrafish , 2012, The Journal of Biological Chemistry.

[43]  F. Liu,et al.  Evidence for Female-Biased Dispersal in the Protandrous Hermaphroditic Asian Seabass, Lates calcarifer , 2012, PloS one.

[44]  M. Schatz,et al.  Hybrid error correction and de novo assembly of single-molecule sequencing reads , 2012, Nature Biotechnology.

[45]  A. Gill,et al.  Description of two new species of sea bass (Teleostei: Latidae: Lates) from Myanmar and Sri Lanka , 2012 .

[46]  Alex A. Pollen,et al.  The genomic basis of adaptive evolution in threespine sticklebacks , 2012, Nature.

[47]  Steven L Salzberg,et al.  Fast gapped-read alignment with Bowtie 2 , 2012, Nature Methods.

[48]  Mark Yandell,et al.  MAKER2: an annotation pipeline and genome-database management tool for second-generation genome projects , 2011, BMC Bioinformatics.

[49]  P. Stadler,et al.  ViennaRNA Package 2.0 , 2011, Algorithms for Molecular Biology : AMB.

[50]  Sebastian D. Mackowiak,et al.  miRDeep2 accurately identifies known and hundreds of novel microRNA genes in seven animal clades , 2011, Nucleic acids research.

[51]  Inge Jonassen,et al.  The genome sequence of Atlantic cod reveals a unique immune system , 2011, Nature.

[52]  Marcel Martin Cutadapt removes adapter sequences from high-throughput sequencing reads , 2011 .

[53]  Chun Ming Wang,et al.  A high-resolution linkage map for comparative genome analysis and QTL fine mapping in Asian seabass, Lates calcarifer , 2011, BMC Genomics.

[54]  Carl Kingsford,et al.  A fast, lock-free approach for efficient parallel counting of occurrences of k-mers , 2011, Bioinform..

[55]  M. Frith,et al.  Adaptive seeds tame genomic sequence comparison. , 2011, Genome research.

[56]  David R. Kelley,et al.  Quake: quality-aware detection and correction of sequencing errors , 2010, Genome Biology.

[57]  Steven J. M. Jones,et al.  Sequencing the genome of the Atlantic salmon (Salmo salar) , 2010, Genome Biology.

[58]  Susan R. Wessler,et al.  MITE-Hunter: a program for discovering miniature inverted-repeat transposable elements from genomic sequences , 2010, Nucleic acids research.

[59]  M. DePristo,et al.  The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. , 2010, Genome research.

[60]  Miriah D. Meyer,et al.  Genome-wide synteny through highly sensitive sequence alignment: Satsuma , 2010, Bioinform..

[61]  O. Gascuel,et al.  New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0. , 2010, Systematic biology.

[62]  Joaquín Dopazo,et al.  ETE: a python Environment for Tree Exploration , 2010, BMC Bioinformatics.

[63]  Ning Ma,et al.  BLAST+: architecture and applications , 2009, BMC Bioinformatics.

[64]  J. Volff,et al.  Pigmentation Pathway Evolution after Whole-Genome Duplication in Fish , 2009, Genome biology and evolution.

[65]  M. Delany,et al.  Genetic variation exists for telomeric array organization within and among the genomes of normal, immortalized, and transformed chicken systems , 2009, Chromosome Research.

[66]  David H. Alexander,et al.  Fast model-based estimation of ancestry in unrelated individuals. , 2009, Genome research.

[67]  Steven J. M. Jones,et al.  Circos: an information aesthetic for comparative genomics. , 2009, Genome research.

[68]  Chun Ming Wang,et al.  Genetic variation and population structure of Asian seabass (Lates calcarifer) in the Asia-Pacific region , 2009 .

[69]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[70]  M. Nishida,et al.  Temporal pattern of loss/persistence of duplicate genes involved in signal transduction and metabolic pathways after teleost-specific genome duplication , 2009, BMC Evolutionary Biology.

[71]  György Abrusán,et al.  TEclass - a tool for automated classification of unknown eukaryotic transposable elements , 2009, Bioinform..

[72]  Nansheng Chen,et al.  Using RepeatMasker to Identify Repetitive Elements in Genomic Sequences , 2009, Current protocols in bioinformatics.

[73]  Yukuto Sato,et al.  Evolution of multiple phosphodiesterase isoforms in stickleback involved in cAMP signal transduction pathway , 2009, BMC Systems Biology.

[74]  S. Turner,et al.  Real-Time DNA Sequencing from Single Polymerase Molecules , 2009, Science.

[75]  Yann Guiguen,et al.  The fate of the duplicated androgen receptor in fishes: a late neofunctionalization event? , 2008, BMC Evolutionary Biology.

[76]  B. Venkatesh,et al.  Rapidly evolving fish genomes and teleost diversity. , 2008, Current opinion in genetics & development.

[77]  Keith Bradnam,et al.  Assessing the gene space in draft genomes , 2008, Nucleic acids research.

[78]  P. Warburton,et al.  Analysis of the largest tandemly repeated DNA families in the human genome , 2008, BMC Genomics.

[79]  Nansheng Chen,et al.  Genblasta: Enabling Blast to Identify Homologous Gene Sequences , 2022 .

[80]  Kazutaka Katoh,et al.  Recent developments in the MAFFT multiple sequence alignment program , 2008, Briefings Bioinform..

[81]  Janet Kelso,et al.  PatMaN: rapid alignment of short sequences to large databases , 2008, Bioinform..

[82]  R. Ward,et al.  DNA barcoding reveals a likely second species of Asian sea bass (barramundi) (Lates calcarifer) , 2008 .

[83]  Stefan Kurtz,et al.  LTRharvest, an efficient and flexible software for de novo detection of LTR retrotransposons , 2008, BMC Bioinformatics.

[84]  Rodrigo Lopez,et al.  Clustal W and Clustal X version 2.0 , 2007, Bioinform..

[85]  M. Nishida,et al.  Post-duplication charge evolution of phosphoglucose isomerases in teleost fishes through weak selection on many amino acid sites , 2007, BMC Evolutionary Biology.

[86]  A. Meyer,et al.  Comparative genomics of ParaHox clusters of teleost fishes: gene cluster breakup and the retention of gene sets following whole genome duplications , 2007, BMC Genomics.

[87]  Mutsumi Nishida,et al.  Evolution of trace amine associated receptor (TAAR) gene family in vertebrates: lineage-specific expansions and degradations of a second class of vertebrate chemosensory receptors expressed in the olfactory epithelium. , 2007, Molecular biology and evolution.

[88]  A. Meyer,et al.  Phylogenomic analyses of KCNA gene clusters in vertebrates: why do gene clusters stay intact? , 2007, BMC Evolutionary Biology.

[89]  Fumiko Ohta,et al.  The medaka draft genome and insights into vertebrate genome evolution , 2007, Nature.

[90]  Akiyasu C. Yoshizawa,et al.  KAAS: an automatic genome annotation and pathway reconstruction server , 2007, Environmental health perspectives.

[91]  Keith Bradnam,et al.  CEGMA: a pipeline to accurately annotate core genes in eukaryotic genomes , 2007, Bioinform..

[92]  L. Orbán,et al.  Anti‐Müllerian hormone and 11 β‐hydroxylase show reciprocal expression to that of aromatase in the transforming gonad of zebrafish males , 2007, Developmental dynamics : an official publication of the American Association of Anatomists.

[93]  L. Orbán,et al.  The timing and extent of 'juvenile ovary' phase are highly variable during zebrafish testis differentiation , 2007 .

[94]  Alejandro A. Schäffer,et al.  A Fast and Symmetric DUST Implementation to Mask Low-Complexity DNA Sequences , 2006, J. Comput. Biol..

[95]  Alejandro A. Schäffer,et al.  WindowMasker: window-based masker for sequenced genomes , 2006, Bioinform..

[96]  G. Yue,et al.  The Complete Mitochondrial Genome Sequence and Characterization of Single-Nucleotide Polymorphisms in the Control Region of the Asian Seabass (Lates calcarifer) , 2006, Marine Biotechnology.

[97]  Feng Chen,et al.  OrthoMCL-DB: querying a comprehensive multi-species collection of ortholog groups , 2005, Nucleic Acids Res..

[98]  J. Jurka,et al.  Repbase Update, a database of eukaryotic repetitive elements , 2005, Cytogenetic and Genome Research.

[99]  Peter Schattner,et al.  The tRNAscan-SE, snoscan and snoGPS web servers for the detection of tRNAs and snoRNAs , 2005, Nucleic Acids Res..

[100]  Catherine A. Wilson,et al.  Characterization and expression pattern of zebrafish Anti-Müllerian hormone (Amh) relative to sox9a, sox9b, and cyp19a1a, during gonad development. , 2005, Gene expression patterns : GEP.

[101]  Thomas D. Wu,et al.  GMAP: a genomic mapping and alignment program for mRNA and EST sequence , 2005, Bioinform..

[102]  David Posada,et al.  ProtTest: selection of best-fit models of protein evolution , 2005, Bioinform..

[103]  Ewan Birney,et al.  Automated generation of heuristics for biological sequence comparison , 2005, BMC Bioinformatics.

[104]  Tatiana Tatusova,et al.  NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins , 2004, Nucleic Acids Res..

[105]  Charles E. Chapple,et al.  Genome duplication in the teleost fish Tetraodon nigroviridis reveals the early vertebrate proto-karyotype , 2004, Nature.

[106]  P. Hebert,et al.  Genome-size evolution in fishes , 2004 .

[107]  A. Meyer,et al.  Phylogenetic Timing of the Fish-Specific Genome Duplication Correlates with the Diversification of Teleost Fish , 2004, Journal of Molecular Evolution.

[108]  David J Perkel,et al.  Songbirds and the Revised Avian Brain Nomenclature , 2004, Annals of the New York Academy of Sciences.

[109]  Alan Christoffels,et al.  Fugu genome analysis provides evidence for a whole-genome duplication early during the evolution of ray-finned fishes. , 2004, Molecular biology and evolution.

[110]  O. Otero Anatomy, systematics and phylogeny of both Recent and fossil latid fishes (Teleostei, Perciformes, Latidae) , 2004 .

[111]  D. Haussler,et al.  Aligning multiple genomic sequences with the threaded blockset aligner. , 2004, Genome research.

[112]  H. Kazazian Mobile Elements: Drivers of Genome Evolution , 2004, Science.

[113]  Nansheng Chen,et al.  Using RepeatMasker to Identify Repetitive Elements in Genomic Sequences , 2009, Current protocols in bioinformatics.

[114]  J. Volff,et al.  Diversity of retrotransposable elements in compact pufferfish genomes. , 2003, Trends in genetics : TIG.

[115]  P. Shannon,et al.  Cytoscape: a software environment for integrated models of biomolecular interaction networks. , 2003, Genome research.

[116]  M. Pardue,et al.  Transposon telomeres are widely distributed in the Drosophila genus: TART elements in the virilis group , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[117]  E. Dimalanta,et al.  A Whole-Genome Shotgun Optical Map of Yersinia pestis Strain KIM , 2002, Applied and Environmental Microbiology.

[118]  Anton J. Enright,et al.  An efficient algorithm for large-scale detection of protein families. , 2002, Nucleic acids research.

[119]  M. Dobson,et al.  Molecular and cytogenetic analysis of the telomeric (TTAGGG)n repetitive sequences in the Nile tilapia, Oreochromis niloticus (Teleostei: Cichlidae) , 2002, Chromosoma.

[120]  J. Volff,et al.  Multiple lineages of the non-LTR retrotransposon Rex1 with varying success in invading fish genomes. , 2000, Molecular biology and evolution.

[121]  Wei Qian,et al.  Selection of conserved blocks from multiple alignments for their use in phylogenetic analysis. , 2000, Molecular biology and evolution.

[122]  Eugene W. Myers,et al.  A whole-genome assembly of Drosophila. , 2000, Science.

[123]  David C. Schwartz,et al.  A shotgun optical map of the entire Plasmodium falciparum genome , 1999, Nature Genetics.

[124]  O. White,et al.  Whole-genome shotgun optical mapping of Deinococcus radiodurans. , 1999, Science.

[125]  X. Huang,et al.  CAP3: A DNA sequence assembly program. , 1999, Genome research.

[126]  R. Poulter,et al.  A LINE element from the pufferfish (fugu) Fugu rubripes which shows similarity to the CR1 family of non-LTR retrotransposons. , 1999, Gene.

[127]  David C. Schwartz,et al.  Genomics via Optical Mapping III: Contiging Genomic DNA , 1998, ISMB.

[128]  J. Fuchs,et al.  Reproductive cycle and sex inversion of the seabass, Lates calcarifer, reared in sea cages in French Polynesia: histological and morphometric description , 1994, Environmental Biology of Fishes.

[129]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[130]  J. Edström,et al.  Long tandem arrays of complex repeat units in Chironomus telomeres. , 1985, The EMBO journal.

[131]  D. Jerry,et al.  The genetics of Asian seabass , 2014 .

[132]  Ira M. Hall,et al.  BEDTools: a flexible suite of utilities for comparing genomic features , 2010, Bioinform..

[133]  Steven J. M. Jones,et al.  Sequencing the genome of the Atlantic salmon , 2010 .

[134]  G. Mathew Taxonomy, identification and biology of Seabass (Lates calcarifer) , 2009 .

[135]  N. Carter,et al.  Generation of Paint Probes by Flow-Sorted and Microdissected Chromosomes , 2009 .

[136]  Pavel A. Pevzner,et al.  De novo identification of repeat families in large genomes , 2005, ISMB.

[137]  O Hammer-Muntz,et al.  PAST: paleontological statistics software package for education and data analysis version 2.09 , 2001 .

[138]  Ø. Hammer,et al.  PAST: PALEONTOLOGICAL STATISTICAL SOFTWARE PACKAGE FOR EDUCATION AND DATA ANALYSIS , 2001 .

[139]  Rolf Apweiler,et al.  The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000 , 2000, Nucleic Acids Res..

[140]  G. Benson,et al.  Tandem repeats finder: a program to analyze DNA sequences. , 1999, Nucleic acids research.

[141]  David C. Schwartz,et al.  Genomics via Optical Mapping II: Ordered Restriction Maps , 1997, J. Comput. Biol..

[142]  Rolf Apweiler,et al.  The SWISS-PROT protein sequence data bank and its supplement TrEMBL , 1997, Nucleic Acids Res..

[143]  T. Davis Maturity and sexuality in Barramundi, Lates calcarifer (Bloch), in the Northern Territory and south-eastern Gulf of Carpentaria , 1982 .

[144]  R. Moore,et al.  Natural Sex Inversion in the Giant Perch (Lates calcarifer) , 1979 .

[145]  Matthew B. Stocks,et al.  Bioinformatics Applications Note Sequence Analysis the Uea Srna Workbench: a Suite of Tools for Analysing and Visualizing next Generation Sequencing Microrna and Small Rna Datasets , 2022 .