Prediction of Subcellular Localization of Multi-site Virus Proteins Based on Convolutional Neural Networks

Prediction of subcellular localization is critical for the analysis of mechanism and functions of proteins and biological research. A series of efficient methods have been proposed to identify subcellular localization, but challenges still exist. In this paper, a novel feature extraction method, denoted as F-Dipe, is proposed to identify subcellular localization. F-Dipe, which is based on dipeptide pseudo amino acid composition method, improves the performance of multi-site prediction by increasing the focus information of proteins. Besides, convolution neural networks, denoted as CNN, is utilized to predict the subcellular localization of multi-site virus proteins. The multi-label k-nearest neighbor algorithm, denoted as MLKNN, is a base classifier to verify the performance of F-Dipe and CNN. The best overall accuracy of F-Dipe on dataset S from the predictor of MLKNN is 59.92%, higher than the accuracy of pseudo amino acid based features method, denoted as PseAAC, 57.14% and the best overall accuracy of F-Dipe on database S from the predictor of CNN is 62.3%, better than from the predictor of MLKNN 59.92%.

[1]  Hu Min Feature Selection Based on Adaptive Genetic Algorithm and SVM , 2009 .

[2]  Shi-Yuan Han,et al.  Approximation Optimal Vibration for Networked Nonlinear Vehicle Active Suspension with Actuator Time Delay , 2017 .

[3]  Zhu-Hong You,et al.  Predicting dynamic deformation of retaining structure by LSSVR-based time series method , 2014, Neurocomputing.

[4]  Michelle S. Scott,et al.  Predicting the subcellular localization of viral proteins within a mammalian host cell , 2006, Virology Journal.

[5]  Xiaobo Zhou,et al.  Systemic modeling myeloma-osteoclast interactions under normoxic/hypoxic condition using a novel computational approach , 2015, Scientific Reports.

[6]  Kuo-Chen Chou,et al.  A top-down approach to enhance the power of predicting human protein subcellular localization: Hum-mPLoc 2.0. , 2009, Analytical biochemistry.

[7]  K. Chou,et al.  PseAAC: a flexible web server for generating various kinds of protein pseudo amino acid composition. , 2008, Analytical biochemistry.

[8]  Seyed-Ahmad Ahmadi,et al.  Hough-CNN: Deep learning for segmentation of deep brain regions in MRI and ultrasound , 2016, Comput. Vis. Image Underst..

[9]  Kuo-Chen Chou,et al.  A New Method for Predicting the Subcellular Localization of Eukaryotic Proteins with Both Single and Multiple Sites: Euk-mPLoc 2.0 , 2010, PloS one.

[10]  K. Chou,et al.  iLoc-Hum: using the accumulation-label scale to predict subcellular locations of human proteins with both single and multiple sites. , 2012, Molecular bioSystems.

[11]  Zhu-Hong You,et al.  Using manifold embedding for assessing and predicting protein interactions from high-throughput experimental data , 2010, Bioinform..

[12]  Wenzheng Bao,et al.  Prediction of protein structure classes with flexible neural tree. , 2014, Bio-medical materials and engineering.

[13]  K. Chou,et al.  iLoc-Virus: a multi-label learning classifier for identifying the subcellular localization of virus proteins with both single and multiple sites. , 2011, Journal of theoretical biology.

[14]  K. Chou Pseudo Amino Acid Composition and its Applications in Bioinformatics, Proteomics and System Biology , 2009 .

[15]  K. Chou Impacts of bioinformatics to medicinal chemistry. , 2015, Medicinal chemistry (Shariqah (United Arab Emirates)).

[16]  Yanyun Zhao,et al.  Effect of different drying methods on the myosin structure, amino acid composition, protein digestibility and volatile profile of squid fillets. , 2015, Food chemistry.

[17]  K. Chou,et al.  iLoc-Gpos: a multi-layer classifier for predicting the subcellular localization of singleplex and multiplex Gram-positive bacterial proteins. , 2012, Protein and peptide letters.

[18]  Zhi-Hua Zhou,et al.  ML-KNN: A lazy learning approach to multi-label learning , 2007, Pattern Recognit..

[19]  S. Brunak,et al.  Predicting subcellular localization of proteins based on their N-terminal amino acid sequence. , 2000, Journal of molecular biology.

[20]  Kuo-Chen Chou,et al.  A Multi-Label Classifier for Predicting the Subcellular Localization of Gram-Negative Bacterial Proteins with Both Single and Multiple Sites , 2011, PloS one.

[21]  Sen Jia,et al.  Convolutional neural networks for hyperspectral image classification , 2017, Neurocomputing.

[22]  Chittibabu Guda,et al.  Predicting the Subcellular Localization of Human Proteins Using Machine Learning and Exploratory Data Analysis , 2006, Genom. Proteom. Bioinform..

[23]  Shi-Yuan Han,et al.  Fault diagnosis and fault-tolerant tracking control for discrete-time systems with faults and delays in actuator and measurement , 2017, J. Frankl. Inst..

[24]  James G. Lyons,et al.  Gram-positive and Gram-negative protein subcellular localization by incorporating evolutionary-based descriptors into Chou׳s general PseAAC. , 2015, Journal of theoretical biology.

[25]  Shi-Yuan Han,et al.  Sensor Fault and Delay Tolerant Control for Networked Control Systems Subject to External Disturbances , 2017, Sensors.

[26]  De-Shuang Huang,et al.  Normalized Feature Vectors: A Novel Alignment-Free Sequence Comparison Method Based on the Numbers of Adjacent Amino Acids , 2013, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[27]  Guo-Zheng Li,et al.  Virus-ECC-mPLoc: a multi-label predictor for predicting the subcellular localization of virus proteins with both single and multiple sites based on a general form of Chou's pseudo amino acid composition. , 2013, Protein and peptide letters.

[28]  Bing Wang,et al.  Prediction of peptide drift time in ion mobility mass spectrometry from sequence-based features , 2013, BMC Bioinformatics.