论文信息 - Sequence-based prediction of protein-protein interactions using ensemble based classifier combined with global encoding in HIV (human immunodeficiency virus)

Sequence-based prediction of protein-protein interactions using ensemble based classifier combined with global encoding in HIV (human immunodeficiency virus)

Human Immunodeficiency Virus is a type of intracellular obligate retrovirus that attacks the human body’s immune system. This virus attacks by doing interaction between the virus and human proteins. This research uses data of amino acids sequence from protein that the feature will be modified using Global Encoding as feature extraction method and then combined with the Rotation Forest in predicting the interaction between HIV and human proteins. The Global Encoding method will first group 20 types of amino acids into 6 classes and then get 10 combinations each containing three different classes. Based on these 10 combinations, a protein sequence will be transformed into 10 characteristic sequence binaries. Each sequence characteristic is further divided into several subsets based on a partition method. Then, two types of protein descriptor, composition and transition, were extracted to represent each protein sequence and used as final input vectors for the classification method. Finally, Rotation Forest is used to predicting the class of protein interactions between humans and HIV proteins. The best model obtained in this research has an accuracy of 79.50 %, sensitivity of 79.91 %, specificity of 79.07 %, and precision of 79.77 % in predicting protein interactions between HIV and Human.Human Immunodeficiency Virus is a type of intracellular obligate retrovirus that attacks the human body’s immune system. This virus attacks by doing interaction between the virus and human proteins. This research uses data of amino acids sequence from protein that the feature will be modified using Global Encoding as feature extraction method and then combined with the Rotation Forest in predicting the interaction between HIV and human proteins. The Global Encoding method will first group 20 types of amino acids into 6 classes and then get 10 combinations each containing three different classes. Based on these 10 combinations, a protein sequence will be transformed into 10 characteristic sequence binaries. Each sequence characteristic is further divided into several subsets based on a partition method. Then, two types of protein descriptor, composition and transition, were extracted to represent each protein sequence and used as final input vectors for the classification method. Finally, Rotation Forest i...

Alhadi Bustamam | D. Lestari | M. I. S. Musti | A. Bustamam | D. Lestari

[1] Jiawei Luo,et al. Protein functional class prediction using global encoding of amino acid sequence. , 2009, Journal of theoretical biology.

[2] Mona Singh,et al. Predicting functionally important residues from sequence conservation , 2007, Bioinform..

[3] Xing Chen,et al. Sequence-based prediction of protein-protein interactions using weighted sparse representation model combined with global encoding , 2016, BMC Bioinformatics.

[4] Yong Zhou,et al. Advancing the prediction accuracy of protein-protein interactions by utilizing evolutionary information from position-specific scoring matrix and ensemble classifier. , 2017, Journal of theoretical biology.

[5] Zhu-Hong You,et al. An improved sequence-based prediction protocol for protein-protein interactions using amino acids substitution matrix and rotation forest ensemble classifiers , 2017, Neurocomputing.

[6] Juan José Rodríguez Diez,et al. Rotation Forest: A New Classifier Ensemble Method , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7] Yanzhi Guo,et al. Using support vector machine combined with auto covariance to predict protein–protein interactions from protein sequences , 2008, Nucleic acids research.