Predicting subcellular localization of multisite proteins using differently weighted multi-label k-nearest neighbors sets

BACKGROUND: For a protein to execute its function, ensuring its correct subcellular localization is essential. In addition to biological experiments, bioinformatics is widely used to predict and determine the subcellular localization of proteins. However, single-feature extraction methods cannot effectively handle the huge amount of data and multisite localization of proteins. Thus, we developed a pseudo amino acid composition (PseAAC) method and an entropy density technique to extract feature fusion information from subcellular multisite proteins. OBJECTIVE: Predicting multiplex protein subcellular localization and achieve high prediction accuracy. METHOD: To improve the efficiency of predicting multiplex protein subcellular localization, we used the multi-label k-nearest neighbors algorithm and assigned different weights to various attributes. The method was evaluated using several performance metrics with a dataset consisting of protein sequences with single-site and multisite subcellular localizations. RESULTS: Evaluation experiments showed that the proposed method significantly improves the optimal overall accuracy rate of multiplex protein subcellular localization. CONCLUSION: This method can help to more comprehensively predict protein subcellular localization toward better understanding protein function, thereby bridging the gap between theory and application toward improved identification and monitoring of drug targets.

[1]  B. Liu,et al.  Pse-Analysis: a python package for DNA/RNA and protein/peptide sequence analysis based on pseudo components and kernel methods , 2017, Oncotarget.

[2]  K. Chou,et al.  iLoc-Animal: a multi-label learning classifier for predicting subcellular localization of animal proteins. , 2013, Molecular bioSystems.

[3]  Yuehui Chen,et al.  Classification of Protein Structure Classes on Flexible Neutral Tree , 2017, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[4]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[5]  De-Shuang Huang,et al.  Pupylation sites prediction with ensemble classification model , 2017, Int. J. Data Min. Bioinform..

[6]  Wenzheng Bao,et al.  Classification of Protein Structure Classes on Flexible Neutral Tree. , 2016, IEEE/ACM transactions on computational biology and bioinformatics.

[7]  Shu Kondo,et al.  Dynamic Fluctuations in Subcellular Localization of the Hippo Pathway Effector Yorkie In Vivo , 2018, Current Biology.

[8]  Xindong Wu,et al.  Learning Label-Specific Features and Class-Dependent Labels for Multi-Label Classification , 2016, IEEE Transactions on Knowledge and Data Engineering.

[9]  Wei Huang,et al.  SHYCD induces APE1/Ref-1 subcellular localization to regulate the p53-apoptosis signaling pathway in the prevention and treatment of acute on chronic liver failure , 2017, Oncotarget.

[10]  Wei Zhang,et al.  TDSDMI: Inference of time-delayed gene regulatory network using S-system model with delayed mutual information , 2016, Comput. Biol. Medicine.

[11]  Zhijun Qiu,et al.  Protein-protein interaction site predictions with minimum covariance determinant and Mahalanobis distance. , 2017, Journal of theoretical biology.

[12]  Jianping Yin,et al.  Foreword to the special issue on recent advances on pattern recognition and artificial intelligence , 2018, Neural Computing and Applications.

[13]  Qiuwen Zhang,et al.  MultiP-SChlo: Multi-label protein subchloroplast localization prediction , 2014, 2014 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[14]  Wei Zhang,et al.  Reverse engineering of gene regulatory network using restricted gene expression programming , 2016, J. Bioinform. Comput. Biol..

[15]  Sankirti Shiravale,et al.  A Survey on Multi-label Classification for Images , 2017 .

[16]  Peng Wu,et al.  Classification of a DNA Microarray for Diagnosing Cancer Using a Complex Network Based Method , 2019, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[17]  Yuehui Chen,et al.  Feature Combination Methods for Prediction of Subcellular Locations of Proteins with Both Single and Multiple Sites , 2016, ICIC.

[18]  Yi Yang,et al.  An improved KNN text classification algorithm based on Simhash , 2017, 2017 IEEE 16th International Conference on Cognitive Informatics & Cognitive Computing (ICCI*CC).

[19]  De-Shuang Huang,et al.  Novel human microbe-disease association prediction using network consistency projection , 2017, BMC Bioinformatics.

[20]  Yuehui Chen,et al.  Somatic mutation detection using ensemble of flexible neural tree model , 2016, Neurocomputing.

[21]  Wei Huang,et al.  SHYCD induces APE 1 / Ref-1 subcellular localization to regulate the p 53-apoptosis signaling pathway in the prevention and treatment of acute on chronic liver failure , 2017 .

[22]  Shikha Agrawal,et al.  A comparative study of fuzzy PSO and fuzzy SVD-based RBF neural network for multi-label classification , 2016, Neural Computing and Applications.

[23]  Yuehui Chen,et al.  The dynamic mechanism of a novel stochastic neural firing pattern observed in a real biological system , 2019, Cognitive Systems Research.

[24]  Yuehui Chen,et al.  Predicting the Subcellular Localization of Proteins with Multiple Sites Based on N-Terminal Signals , 2013, 2013 International Conference on Information Science and Cloud Computing Companion.

[25]  Minghui Wang,et al.  Prediction of protein structural class for low-similarity sequences using Chou's pseudo amino acid composition and wavelet denoising. , 2017, Journal of molecular graphics & modelling.

[26]  Zhu-Hong You,et al.  CIPPN: computational identification of protein pupylation sites by using neural network , 2017, Oncotarget.

[27]  Zhong Liu,et al.  An approach to variable-order prediction via multiple distal dendrites of neurons , 2018, Neural Computing and Applications.

[28]  Kyungsook Han,et al.  Mutli-Features Prediction of Protein Translational Modification Sites , 2018, IEEE/ACM Transactions on Computational Biology and Bioinformatics.