NNTox: Gene Ontology-Based Protein Toxicity Prediction Using Neural Network

With advancements in synthetic biology, the cost and the time needed for designing and synthesizing customized gene products have been steadily decreasing. Many research laboratories in academia as well as industry routinely create genetically engineered proteins as a part of their research activities. However, manipulation of protein sequences could result in unintentional production of toxic proteins. Therefore, being able to identify the toxicity of a protein before the synthesis would reduce the risk of potential hazards. Existing methods are too specific, which limits their application. Here, we extended general function prediction methods for predicting the toxicity of proteins. Protein function prediction methods have been actively studied in the bioinformatics community and have shown significant improvement over the last decade. We have previously developed successful function prediction methods, which were shown to be among top-performing methods in the community-wide functional annotation experiment, CAFA. Based on our function prediction method, we developed a neural network model, named NNTox, which uses predicted GO terms for a target protein to further predict the possibility of the protein being toxic. We have also developed a multi-label model, which can predict the specific toxicity type of the query sequence. Together, this work analyses the relationship between GO terms and protein toxicity and builds predictor models of protein toxicity.

[1]  Daisuke Kihara,et al.  The PFP and ESG protein function prediction methods in 2014: effect of database updates and ensemble approaches , 2015, GigaScience.

[2]  Daisuke Kihara,et al.  Using PFP and ESG Protein Function Prediction Web Servers. , 2017, Methods in molecular biology.

[3]  Daisuke Kihara,et al.  PFP/ESG: automated protein function prediction servers enhanced with Gene Ontology visualization tool , 2015, Bioinform..

[4]  Rahul Kumar,et al.  In Silico Approach for Predicting Toxicity of Peptides and Proteins , 2013, PloS one.

[5]  D. Kihara,et al.  PFP: Automated prediction of gene ontology functional annotations with confidence scores using protein sequence data , 2009, Proteins.

[6]  Igor L. Medintz,et al.  Detecting Biothreat Agents: From Current Diagnostics to Developing Sensor Technologies. , 2018, ACS sensors.

[7]  Daniel W. A. Buchan,et al.  A large-scale evaluation of computational protein function prediction , 2013, Nature Methods.

[8]  Juancarlos Chan,et al.  Gene Ontology Consortium: going forward , 2014, Nucleic Acids Res..

[9]  Lisa C Shriver-Lake,et al.  Discrimination between biothreat agents and 'near neighbor' species using a resequencing array. , 2008, FEMS immunology and medical microbiology.

[10]  Michal Linial,et al.  ClanTox: a classifier of short animal toxins , 2009, Nucleic Acids Res..

[11]  Robin Holland,et al.  Perspective on Improving Environmental Monitoring of Biothreats , 2018, Front. Bioeng. Biotechnol..

[12]  J. Klimentova,et al.  Proteomic Methods of Detection and Quantification of Protein Toxins , 2018, Toxins.

[13]  Tapio Salakoski,et al.  An expanded evaluation of protein function prediction methods shows an improvement in accuracy , 2016, Genome Biology.

[14]  Nicholas C Tang,et al.  DNA synthesis, assembly and applications in synthetic biology. , 2012, Current opinion in chemical biology.

[15]  Kumardeep Chaudhary,et al.  In Silico Approach for Prediction of Antifungal Peptides , 2018, Front. Microbiol..

[16]  Pratyoosh Shukla,et al.  Microbial platform technology for recombinant antibody fragment production: A review , 2017, Critical reviews in microbiology.

[17]  Helaine Carrer,et al.  Genetically transformed tobacco plants expressing synthetic EPSPS gene confer tolerance against glyphosate herbicide , 2017, Physiology and Molecular Biology of Plants.

[18]  Cathy H. Wu,et al.  UniProt: the Universal Protein knowledgebase , 2004, Nucleic Acids Res..

[19]  Daisuke Kihara,et al.  Phylo‐PFP: improved automated protein function prediction using phylogenetic distance of distantly related sequences , 2018, Bioinform..

[20]  Andrew D Ellington,et al.  Synthetic DNA Synthesis and Assembly: Putting the Synthetic in Synthetic Biology. , 2017, Cold Spring Harbor perspectives in biology.

[21]  Gajendra P. S. Raghava,et al.  BTXpred: Prediction of Bacterial Toxins , 2007, Silico Biol..

[22]  L I Karpenko,et al.  Design of Artificial Immunogens Containing Melanoma-associated T-cell Epitopes. , 2018, Current gene therapy.

[23]  D. Baker,et al.  The coming of age of de novo protein design , 2016, Nature.

[24]  Arik Eisenkraft,et al.  Toxins as biological weapons for terror—characteristics, challenges and medical countermeasures: a mini-review , 2016, Disaster and Military Medicine.

[25]  Yaoqi Zhou,et al.  Improving prediction of protein secondary structure, backbone angles, solvent accessibility and contact numbers by using predicted contact maps and an ensemble of recurrent and residual convolutional neural networks , 2018, Bioinform..

[26]  Kun Zhang,et al.  Multi-label learning by exploiting label dependency , 2010, KDD.