ToxCodAn: a new toxin annotator and guide to venom gland transcriptomics

MOTIVATION Next-generation sequencing has become exceedingly common and has transformed our ability to explore nonmodel systems. In particular, transcriptomics has facilitated the study of venom and evolution of toxins in venomous lineages; however, many challenges remain. Primarily, annotation of toxins in the transcriptome is a laborious and time-consuming task. Current annotation software often fails to predict the correct coding sequence and overestimates the number of toxins present in the transcriptome. Here, we present ToxCodAn, a python script designed to perform precise annotation of snake venom gland transcriptomes. We test ToxCodAn with a set of previously curated transcriptomes and compare the results to other annotators. In addition, we provide a guide for venom gland transcriptomics to facilitate future research and use Bothrops alternatus as a case study for ToxCodAn and our guide. RESULTS Our analysis reveals that ToxCodAn provides precise annotation of toxins present in the transcriptome of venom glands of snakes. Comparison with other annotators demonstrates that ToxCodAn has better performance with regard to run time ($>20x$ faster), coding sequence prediction ($>3x$ more accurate) and the number of toxins predicted (generating $>4x$ less false positives). In this sense, ToxCodAn is a valuable resource for toxin annotation. The ToxCodAn framework can be expanded in the future to work with other venomous lineages and detect novel toxins.

[1]  Colin N. Dewey,et al.  De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis , 2013, Nature Protocols.

[2]  Colin N. Dewey,et al.  RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome , 2011, BMC Bioinformatics.

[3]  D. Rokyta,et al.  The transcriptomic and proteomic basis for the evolution of a novel venom phenotype within the Timber Rattlesnake (Crotalus horridus). , 2015, Toxicon : official journal of the International Society on Toxinology.

[4]  S. Hedges,et al.  Early evolution of the venom system in lizards and snakes , 2006, Nature.

[5]  D. Rokyta,et al.  Linking the transcriptome and proteome to characterize the venom of the eastern diamondback rattlesnake (Crotalus adamanteus). , 2014, Journal of proteomics.

[6]  Jeffery P. Demuth,et al.  The origins and evolution of chromosomes, dosage compensation, and mechanisms underlying venom regulation in snakes , 2019, Genome research.

[7]  E. Kochva,et al.  Studies on ribonucleic acid synthesis in the venom glands of Vipera palaestinae (Ophidia, Reptilia). , 1971, The Biochemical journal.

[8]  D. Andrade,et al.  RELATIONSHIP OF VENOM ONTOGENY AND DIET IN BOTHROPS , 1999 .

[9]  D. Janies,et al.  Venomix: a simple bioinformatic pipeline for identifying and characterizing toxin gene candidates from transcriptomic data , 2018, PeerJ.

[10]  D. Rokyta,et al.  RNA-seq and high-definition mass spectrometry reveal the complex and divergent venoms of two rear-fanged colubrid snakes , 2014, BMC Genomics.

[11]  Ioannis Xenarios,et al.  The UniProtKB/Swiss-Prot Tox-Prot program: A central hub of integrated venom protein data. , 2012, Toxicon : official journal of the International Society on Toxinology.

[12]  Stephen Hyslop,et al.  A transcriptomic analysis of gene expression in the venom gland of the snake Bothrops alternatus (urutu) , 2010, BMC Genomics.

[13]  Timothy L. Tickle,et al.  A Tissue-Mapped Axolotl De Novo Transcriptome Enables Identification of Limb Regeneration Factors. , 2017, Cell reports.

[14]  M. Nishiyama,et al.  Lachesis muta (Viperidae) cDNAs Reveal Diverging Pit Viper Molecules and Scaffolds Typical of Cobra (Elapidae) Venoms: Implications for Snake Toxin Repertoire Evolution , 2006, Genetics.

[15]  Sonia Zanini Cechin,et al.  Influência dos fatores abióticos e da disponibilidade de presas sobre comunidade de serpentes do Planalto Médio do Rio Grande do Sul , 2009 .

[16]  J. McPherson,et al.  Coming of age: ten years of next-generation sequencing technologies , 2016, Nature Reviews Genetics.

[17]  Rhett M. Rautsaw,et al.  The Tiger Rattlesnake genome reveals a complex genotype underlying a simple venom phenotype , 2021, Proceedings of the National Academy of Sciences.

[18]  Andrew J. Mason,et al.  Phenotypic Variation in Mojave Rattlesnake (Crotalus scutulatus) Venom Is Driven by Four Toxin Families , 2018, Toxins.

[19]  Rhett M. Rautsaw,et al.  Comparative venom-gland transcriptomics and venom proteomics of four Sidewinder Rattlesnake (Crotalus cerastes) lineages reveal little differential expression despite individual variation , 2018, Scientific Reports.

[20]  D. Rokyta,et al.  Post-transcriptional Mechanisms Contribute Little to Phenotypic Variation in Snake Venoms , 2015, G3: Genes, Genomes, Genetics.

[21]  A. Lemmon,et al.  The venom-gland transcriptome of the eastern diamondback rattlesnake (Crotalus adamanteus) , 2012, BMC Genomics.

[22]  Pedro G Nachtigall,et al.  CodAn: predictive models for precise identification of coding regions in eukaryotic transcripts , 2020, Briefings Bioinform..

[23]  J. Calvete,et al.  Venoms, venomics, antivenomics , 2009, FEBS letters.

[24]  Andrew J. Mason,et al.  Evaluating the Performance of De Novo Assembly Methods for Venom-Gland Transcriptomics , 2018, Toxins.

[25]  S. Kelly,et al.  Deep Evolutionary Comparison of Gene Expression Identifies Parallel Recruitment of Trans-Factors in Two Independent Origins of C4 Photosynthesis , 2014, PLoS genetics.

[26]  M. Gerstein,et al.  RNA-Seq: a revolutionary tool for transcriptomics , 2009, Nature Reviews Genetics.

[27]  Jiajie Zhang,et al.  PEAR: a fast and accurate Illumina Paired-End reAd mergeR , 2013, Bioinform..

[28]  D. Rokyta,et al.  Size Matters: An Evaluation of the Molecular Basis of Ontogenetic Modifications in the Composition of Bothrops jararacussu Snake Venom , 2020, Toxins.

[29]  S. Mackessy,et al.  Colubrid Venom Composition: An -Omics Perspective , 2016, Toxins.

[30]  Robert M. Waterhouse,et al.  BUSCO Applications from Quality Assessments to Gene Prediction and Phylogenomics , 2017, bioRxiv.

[31]  G. Colli,et al.  Atlas of Brazilian Snakes: Verified Point-Locality Maps to Mitigate the Wallacean Shortfall in a Megadiverse Snake Fauna , 2019, South American Journal of Herpetology.

[32]  S. Brunak,et al.  SignalP 4.0: discriminating signal peptides from transmembrane regions , 2011, Nature Methods.

[33]  Markus S. Schröder,et al.  The Indian cobra reference genome and transcriptome enables comprehensive identification of venom toxins , 2020, Nature Genetics.

[34]  Nicholas R Casewell,et al.  Complex cocktails: the evolutionary novelty of venoms. , 2013, Trends in ecology & evolution.

[35]  Steven L Salzberg,et al.  Fast gapped-read alignment with Bowtie 2 , 2012, Nature Methods.

[36]  D. Rokyta,et al.  The genesis of an exceptionally lethal venom in the timber rattlesnake (Crotalus horridus) revealed through comparative venom-gland transcriptomics , 2013, BMC Genomics.

[37]  Sudhir Kumar,et al.  TimeTree: A Resource for Timelines, Timetrees, and Divergence Times. , 2017, Molecular biology and evolution.

[38]  M. E. Peichoto,et al.  Rear-fanged snake venoms: an untapped source of novel compounds and potential drug leads , 2014 .

[39]  S. Mackessy,et al.  Venoms of Rear-Fanged Snakes: New Proteins and Novel Activities , 2019, Front. Ecol. Evol..

[40]  D. Rokyta,et al.  Contrasting Modes and Tempos of Venom Expression Evolution in Two Snake Species , 2014, Genetics.

[41]  F. Henrique-Silva,et al.  Molecular characterization of metalloproteases from Bothrops alternatus snake venom. , 2014, Comparative biochemistry and physiology. Part D, Genomics & proteomics.

[42]  J. Hadfield,et al.  RNA sequencing: the teenage years , 2019, Nature Reviews Genetics.

[43]  Ning Ma,et al.  BLAST+: architecture and applications , 2009, BMC Bioinformatics.

[44]  Andrew J. Mason,et al.  Trait differentiation and modular toxin expression in palm-pitvipers , 2020, BMC Genomics.

[45]  J. Calvete Snake venomics: from the inventory of toxins to biology. , 2013, Toxicon : official journal of the International Society on Toxinology.

[46]  Zhengwei Zhu,et al.  CD-HIT: accelerated for clustering the next-generation sequencing data , 2012, Bioinform..

[47]  J. Calvete,et al.  Medically important differences in snake venom composition are dictated by distinct postgenomic mechanisms , 2014, Proceedings of the National Academy of Sciences.

[48]  Vitor Onuchic,et al.  ToPS: A Framework to Manipulate Probabilistic Models of Sequence Data , 2013, PLoS Comput. Biol..

[49]  Jennifer J. Smith,et al.  True Lies: Using Proteomics to Assess the Accuracy of Transcriptome-Based Venomics in Centipedes Uncovers False Positives and Reveals Startling Intraspecific Variation in Scolopendra subspinipes , 2018, Toxins.

[50]  Rhett M. Rautsaw,et al.  Replacement and parallel simplification of non-homologous proteinases maintain venom phenotypes in rear-fanged snakes. , 2020, Molecular biology and evolution.

[51]  Giulia Zancolli,et al.  Venom systems as models for studying the origin and regulation of evolutionary novelties. , 2020, Molecular biology and evolution.