PredictFP2: A New Computational Model to Predict Fusion Peptide Domain in All Retroviruses

Fusion peptide (FP) is a pivotal domain for the entry of retrovirus into host cells to continue self-replication. The crucial role indicates that FP is a promising drug target for therapeutic intervention. A FP model proposed in our previous work is relatively not efficient to predict FP in retroviruses. Thus in this work, we come up with a new computational model to predict FP domains in all the retroviruses. It basically predicts FP domains through recognizing their start and end sites separately with SVM method combing the hydrophobicity knowledge of the subdomain around furin cleavage site. The classification accuracy rates are 91.91, 91.20 and 89.13 percent respectively corresponding to jack-knife, 10-fold cross-validation and 5-fold cross-validation test. Second, this model discovered 69,753 and 493 putative FPs after scanning amino acid sequences and HERV DNA sequences both without FP annotations. Subsequently, a statistical analysis was performed on the 69,753 putative FP sequences, which confirms that FP is a hydrophobic domain. Lastly, we depicted the distribution of the 493 putative FP sequences on each human chromosome and each HERV family, which shows that FP of HERV probably has chromosome and family preference.

[1]  C. Soto,et al.  Fusion peptide of HIV-1 as a site of vulnerability to neutralizing antibody , 2016, Science.

[2]  Xing Chen,et al.  PBMDA: A novel and effective path-based computational model for miRNA-disease association prediction , 2017, PLoS Comput. Biol..

[3]  Ren Long,et al.  iRSpot-EL: identify recombination spots with an ensemble learning approach , 2017, Bioinform..

[4]  Sun Tian,et al.  Computational prediction of furin cleavage sites by a hybrid method and understanding mechanism underlying diseases , 2012, Scientific Reports.

[5]  R. Epand,et al.  Fusion peptides and the mechanism of viral fusion. , 2003, Biochimica et biophysica acta.

[6]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[7]  K. Chou Some remarks on protein attribute prediction and pseudo amino acid composition , 2010, Journal of Theoretical Biology.

[8]  David Haussler,et al.  The UCSC Genome Browser database: 2018 update , 2017, Nucleic Acids Res..

[9]  Kathryn L. Schornberg,et al.  Structures and mechanisms of viral membrane fusion proteins: multiple variations on a common theme. , 2008, Critical reviews in biochemistry and molecular biology.

[10]  Dexing Zhong,et al.  A computational model for predicting integrase catalytic domain of retrovirus. , 2017, Journal of theoretical biology.

[11]  The Uniprot Consortium,et al.  UniProt: a hub for protein information , 2014, Nucleic Acids Res..

[12]  Jiuqiang Han,et al.  A Computational Model for Predicting RNase H Domain of Retrovirus , 2016, PloS one.

[13]  Ruiling Liu,et al.  A computational model for predicting fusion peptide of retroviruses , 2016, Comput. Biol. Chem..

[14]  Steven J. M. Jones,et al.  Circos: an information aesthetic for comparative genomics. , 2009, Genome research.

[15]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[16]  J. L. Nieva,et al.  The three lives of viral fusion peptides , 2014, Chemistry and Physics of Lipids.

[17]  Xing Chen,et al.  NLLSS: Predicting Synergistic Drug Combinations Based on Semi-supervised Learning , 2016, PLoS Comput. Biol..

[18]  D. Eisenberg Three-dimensional structure of membrane and surface proteins. , 1984, Annual review of biochemistry.