TOXIFY: a deep learning approach to classify animal venom proteins

In the era of Next-Generation Sequencing and shotgun proteomics, the sequences of animal toxigenic proteins are being generated at rates exceeding the pace of traditional means for empirical toxicity verification. To facilitate the automation of toxin identification from protein sequences, we trained Recurrent Neural Networks with Gated Recurrent Units on publicly available datasets. The resulting models are available via the novel software package TOXIFY, allowing users to infer the probability of a given protein sequence being a venom protein. TOXIFY is more than 20X faster and uses over an order of magnitude less memory than previously published methods. Additionally, TOXIFY is more accurate, precise, and sensitive at classifying venom proteins. Availability: https://www.github.com/tijeco/toxify

[1]  G. Laconde,et al.  A Dipteran’s Novel Sucker Punch: Evolution of Arthropod Atypical Venom with a Neurotoxic Component in Robber Flies (Asilidae, Diptera) , 2018, Toxins.

[2]  Irina Vetter,et al.  Pharmacological screening technologies for venom peptide discovery , 2017, Neuropharmacology.

[3]  W. Atchley,et al.  Solving the protein sequence metric problem. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[4]  Jens Sadowski,et al.  Comparison of Support Vector Machine and Artificial Neural Network Systems for Drug/Nondrug Classification , 2003, J. Chem. Inf. Comput. Sci..

[5]  David J. Barlow,et al.  Machine learning can differentiate venom toxins from other proteins having non-toxic physiological functions , 2016, PeerJ Comput. Sci..

[6]  Martin T. Swain,et al.  Restriction and Recruitment—Gene Duplication and the Origin and Evolution of Snake Venom Toxins , 2014, bioRxiv.

[7]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[8]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[9]  D. Janies,et al.  Venomix: a simple bioinformatic pipeline for identifying and characterizing toxin gene candidates from transcriptomic data , 2018, PeerJ.

[10]  Michal Linial,et al.  ClanTox: a classifier of short animal toxins , 2009, Nucleic Acids Res..

[11]  Rahul Kumar,et al.  In Silico Approach for Predicting Toxicity of Peptides and Proteins , 2013, PloS one.

[12]  S. Palumbi,et al.  Molecular genetics of ecological diversification: duplication and rapid evolution of toxin genes of the venomous gastropod Conus. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[13]  Michal Linial,et al.  Overlooked Short Toxin-Like Proteins: A Shortcut to Drug Design , 2017, Toxins.

[14]  R. Norton,et al.  The toxicogenomic multiverse: convergent recruitment of proteins into animal venoms. , 2009, Annual review of genomics and human genetics.