Sequencing, de novo assembling, and annotating the genome of the endangered Chinese crocodile lizard Shinisaurus crocodilurus

Abstract The Chinese crocodile lizard, Shinisaurus crocodilurus, is the only living representative of the monotypic family Shinisauridae under the order Squamata. It is an obligate semi-aquatic, viviparous, diurnal species restricted to specific portions of mountainous locations in southwestern China and northeastern Vietnam. However, in the past several decades, this species has undergone a rapid decrease in population size due to illegal poaching and habitat disruption, making this unique reptile species endangered and listed in the Convention on International Trade in Endangered Species of Wild Fauna and Flora Appendix II since 1990. A proposal to uplist it to Appendix I was passed at the Convention on International Trade in Endangered Species of Wild Fauna and Flora Seventeenth meeting of the Conference of the Parties in 2016. To promote the conservation of this species, we sequenced the genome of a male Chinese crocodile lizard using a whole-genome shotgun strategy on the Illumina HiSeq 2000 platform. In total, we generated ∼291 Gb of raw sequencing data (×149 depth) from 13 libraries with insert sizes ranging from 250 bp to 40 kb. After filtering for polymerase chain reaction–duplicated and low-quality reads, ∼137 Gb of clean data (×70 depth) were obtained for genome assembly. We yielded a draft genome assembly with a total length of 2.24 Gb and an N50 scaffold size of 1.47 Mb. The assembled genome was predicted to contain 20 150 protein-coding genes and up to 1114 Mb (49.6%) of repetitive elements. The genomic resource of the Chinese crocodile lizard will contribute to deciphering the biology of this organism and provides an essential tool for conservation efforts. It also provides a valuable resource for future study of squamate evolution.

[1]  Huanming Yang,et al.  Draft genome of the leopard gecko, Eublepharis macularius , 2016, GigaScience.

[2]  Pei Zhang,et al.  Evolutionary trajectories of snake genes and genomes revealed by comparative analyses of five-pacer viper , 2016, Nature Communications.

[3]  Chao Bian,et al.  Draft genome of the Chinese mitten crab, Eriocheir sinensis , 2016, GigaScience.

[4]  Huanming Yang,et al.  Gekko japonicus genome reveals evolution of adhesive toe pads and tail regeneration , 2015, Nature Communications.

[5]  Evgeny M. Zdobnov,et al.  BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs , 2015, Bioinform..

[6]  Guojie Zhang,et al.  High-coverage sequencing and annotated assembly of the genome of the Australian dragon lizard Pogona vitticeps , 2015, GigaScience.

[7]  Meganathan P. Ramakodi,et al.  Three crocodilian genomes reveal ancestral patterns of evolution among archosaurs , 2014, Science.

[8]  Linmiao Li,et al.  Genetic Diversity and Population Demography of the Chinese Crocodile Lizard (Shinisaurus crocodilurus) in China , 2014, PloS one.

[9]  Drew R. Schield,et al.  The Burmese python genome reveals the molecular basis for extreme adaptation in snakes , 2013, Proceedings of the National Academy of Sciences.

[10]  J. Logan,et al.  The king cobra genome reveals dynamic gene evolution and adaptation in the snake venom system , 2013, Proceedings of the National Academy of Sciences.

[11]  E. Birney,et al.  Pfam: the protein families database , 2013, Nucleic Acids Res..

[12]  S. Pan,et al.  Genome analysis and signature discovery for diving and sensory properties of the endangered Chinese alligator , 2013, Cell Research.

[13]  Bronwen L. Aken,et al.  The draft genomes of soft–shell turtle and green sea turtle yield insights into the development and evolution of the turtle–specific body plan , 2013, Nature Genetics.

[14]  Daniel E. Warren,et al.  The western painted turtle genome, a model for the evolution of extreme physiological adaptations in a slowly evolving lineage , 2013, Genome Biology.

[15]  Jian Wang,et al.  SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler , 2012, GigaScience.

[16]  Laurie Goodman,et al.  Large and linked in scientific publishing , 2012, GigaScience.

[17]  Peer Bork,et al.  SMART 7: recent updates to the protein domain annotation resource , 2011, Nucleic Acids Res..

[18]  Jacob D. Jaffe,et al.  The genome of the green anole lizard and a comparative analysis with birds and mammals , 2011, Nature.

[19]  J. Ragle,et al.  IUCN Red List of Threatened Species , 2010 .

[20]  M. V. van Osch,et al.  Functional and Structural Diversification of the Anguimorpha Lizard Venom System* , 2010, Molecular & Cellular Proteomics.

[21]  Dawei Li,et al.  The sequence and de novo assembly of the giant panda genome , 2010, Nature.

[22]  Anushya Muruganujan,et al.  PANTHER version 7: improved phylogenetic trees, orthologs and collaboration with the Gene Ontology Consortium , 2009, Nucleic Acids Res..

[23]  Amos Bairoch,et al.  PROSITE, a protein domain database for functional characterization and annotation , 2009, Nucleic Acids Res..

[24]  F. Wei,et al.  Population and conservation strategies for the Chinese crocodile lizard (Shinisaurus crocodilurus) in China , 2008, Animal Biodiversity and Conservation.

[25]  Nansheng Chen,et al.  Genblasta: Enabling Blast to Identify Homologous Gene Sequences , 2022 .

[26]  Zhao Xu,et al.  LTR_FINDER: an efficient tool for the prediction of full-length LTR retrotransposons , 2007, Nucleic Acids Res..

[27]  Burkhard Morgenstern,et al.  AUGUSTUS: ab initio prediction of alternative transcripts , 2006, Nucleic Acids Res..

[28]  S. Hedges,et al.  Early evolution of the venom system in lizards and snakes , 2006, Nature.

[29]  J. Maisano,et al.  The ossified braincase and cephalic osteoderms of Shinisaurus crocodilurus (Squamata, Shinisauridae) , 2005 .

[30]  L. Holm,et al.  The Pfam protein families database , 2005, Nucleic Acids Res..

[31]  R. Durbin,et al.  GeneWise and Genomewise. , 2004, Genome research.

[32]  G. Weinstock,et al.  The Atlas genome assembly system. , 2004, Genome research.

[33]  W. J. Kent,et al.  BLAT--the BLAST-like alignment tool. , 2002, Genome research.

[34]  Rolf Apweiler,et al.  InterProScan - an integration platform for the signature-recognition methods in InterPro , 2001, Bioinform..

[35]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[36]  M. Bonkowski,et al.  Potential distribution and effectiveness of the protected area network for the crocodile lizard, Shinisaurus crocodilurus (Reptilia: Squamata: Sauria) , 2014 .

[37]  W. Böhme,et al.  A COMPARATIVE STUDY OF CROCODILE LIZARDS (SHINISAURUS CROCODILURUS AHL, 1930) FROM VIETNAM AND CHINA , 2008 .

[38]  Terri K. Attwood,et al.  PRINTS and its automatic supplement, prePRINTS , 2003, Nucleic Acids Res..

[39]  Jérôme Gouzy,et al.  ProDom and ProDom-CG: tools for protein domain analysis and whole genome comparisons , 2000, Nucleic Acids Res..

[40]  Rolf Apweiler,et al.  The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000 , 2000, Nucleic Acids Res..

[41]  G. Benson,et al.  Tandem repeats finder: a program to analyze DNA sequences. , 1999, Nucleic acids research.

[42]  Hiroyuki Ogata,et al.  KEGG: Kyoto Encyclopedia of Genes and Genomes , 1999, Nucleic Acids Res..

[43]  Rolf Apweiler,et al.  The SWISS-PROT protein sequence data bank and its supplement TrEMBL , 1997, Nucleic Acids Res..

[44]  J. B. Simmons,et al.  Convention on International Trade in Endangered Species of Wild Fauna and Flora , 1976 .