Gram-positive and gram-negative subcellular localization using rotation forest and physicochemical-based features

BackgroundThe functioning of a protein relies on its location in the cell. Therefore, predicting protein subcellular localization is an important step towards protein function prediction. Recent studies have shown that relying on Gene Ontology (GO) for feature extraction can improve the prediction performance. However, for newly sequenced proteins, the GO is not available. Therefore, for these cases, the prediction performance of GO based methods degrade significantly.ResultsIn this study, we develop a method to effectively employ physicochemical and evolutionary-based information in the protein sequence. To do this, we propose segmentation based feature extraction method to explore potential discriminatory information based on physicochemical properties of the amino acids to tackle Gram-positive and Gram-negative subcellular localization. We explore our proposed feature extraction techniques using 10 attributes that have been experimentally selected among a wide range of physicochemical attributes. Finally by applying the Rotation Forest classification technique to our extracted features, we enhance Gram-positive and Gram-negative subcellular localization accuracies up to 3.4% better than previous studies which used GO for feature extraction.ConclusionBy proposing segmentation based feature extraction method to explore potential discriminatory information based on physicochemical properties of the amino acids as well as using Rotation Forest classification technique, we are able to enhance the Gram-positive and Gram-negative subcellular localization prediction accuracies, significantly.

[1]  Kuo-Chen Chou,et al.  Large-scale predictions of gram-negative bacterial protein subcellular locations. , 2006, Journal of proteome research.

[2]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[3]  K. Chou,et al.  Gpos-PLoc: an ensemble classifier for predicting subcellular localization of Gram-positive bacterial proteins. , 2007, Protein engineering, design & selection : PEDS.

[4]  Abdollah Dehzangi,et al.  Solving protein fold prediction problem using fusion of heterogeneous classifiers , 2011 .

[5]  Somnuk Phon-Amnuaisuk,et al.  Protein Fold Prediction Problem Using Ensemble of Classifiers , 2009, ICONIP.

[6]  K. Chou,et al.  Virus-PLoc: a fusion classifier for predicting the subcellular localization of viral proteins within host and virus-infected cells. , 2007, Biopolymers.

[7]  M. Kanehisa,et al.  Expert system for predicting protein localization sites in gram‐negative bacteria , 1991, Proteins.

[8]  K. Chou Some remarks on protein attribute prediction and pseudo amino acid composition , 2010, Journal of Theoretical Biology.

[9]  M. Michael Gromiha,et al.  A Statistical Model for Predicting Protein Folding Rates from Amino Acid Sequence with Structural Class Information , 2005, J. Chem. Inf. Model..

[10]  K. Chou,et al.  Cell-PLoc 2.0: an improved package of web-servers for predicting subcellular localization of proteins in various organisms , 2010 .

[11]  Abdollah Dehzangi,et al.  Fold prediction problem: the application of new physical and physicochemical-based features. , 2011, Protein and peptide letters.

[12]  Abdollah Dehzangi,et al.  Ensemble of Diversely Trained Support Vector Machines for Protein Fold Recognition , 2013, ACIIDS.

[13]  Abdollah Dehzangi,et al.  Protein Fold Recognition Using Segmentation-Based Feature Extraction Model , 2013, ACIIDS.

[14]  Kuldip K. Paliwal,et al.  A mixture of physicochemical and evolutionary-based feature extraction approaches for protein fold recognition , 2015, Int. J. Data Min. Bioinform..

[15]  David S. Wishart,et al.  An improved method to detect correct protein folds using partial clustering , 2013, BMC Bioinformatics.

[16]  Hassan Mohabatkar,et al.  Prediction of allergenic proteins by means of the concept of Chou's pseudo amino acid composition and a machine learning approach. , 2012, Medicinal chemistry (Shariqah (United Arab Emirates)).

[17]  Tonghua Li,et al.  Predicting gram-positive bacterial protein subcellular localization based on localization motifs. , 2012, Journal of theoretical biology.

[18]  Juan José Rodríguez Diez,et al.  Rotation Forest: A New Classifier Ensemble Method , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  J. Gardy,et al.  Methods for predicting bacterial protein subcellular localization , 2006, Nature Reviews Microbiology.

[20]  K. Chou,et al.  Protein subcellular location prediction. , 1999, Protein engineering.

[21]  Kuldip K. Paliwal,et al.  Enhancing Protein Fold Prediction Accuracy Using Evolutionary and Structural Features , 2013, PRIB.

[22]  K. Chou,et al.  Gneg-mPLoc: a top-down strategy to enhance the quality of predicting subcellular localization of Gram-negative bacterial proteins. , 2010, Journal of theoretical biology.

[23]  A. Esmaeili,et al.  Prediction of GABAA receptor proteins using the concept of Chou's pseudo-amino acid composition and support vector machine. , 2011, Journal of theoretical biology.

[24]  Somnuk Phon-Amnuaisuk,et al.  Using Rotation Forest for Protein Fold Prediction Problem: An Empirical Study , 2010, EvoBIO.

[25]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[26]  Kuldip K. Paliwal,et al.  Protein Fold Recognition Using an Overlapping Segmentation Approach and a Mixture of Feature Extraction Models , 2013, Australasian Conference on Artificial Intelligence.

[27]  Kuo-Chen Chou,et al.  A Multi-Label Classifier for Predicting the Subcellular Localization of Gram-Negative Bacterial Proteins with Both Single and Multiple Sites , 2011, PloS one.

[28]  Suyu Mei,et al.  Predicting plant protein subcellular multi-localization by Chou's PseAAC formulation based multi-label homolog knowledge transfer learning. , 2012, Journal of theoretical biology.

[29]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[30]  Thanaruk Theeramunkong,et al.  Predict Subcellular Locations of Singleplex and Multiplex Proteins by Semi-Supervised Learning and Dimension-Reducing General Mode of Chou's PseAAC , 2013, IEEE Transactions on NanoBioscience.

[31]  Deepak Kolippakkam,et al.  APDbase: Amino acid Physicochemical properties Database , 2005, Bioinformation.

[32]  Kuldip K. Paliwal,et al.  A strategy to select suitable physicochemical attributes of amino acids for protein fold recognition , 2013, BMC Bioinformatics.

[33]  Abdollah Dehzangi,et al.  A Combination of Feature Extraction Methods with an Ensemble of Different Classifiers for Protein Structural Class Prediction Problem , 2013, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[34]  Chao Huang,et al.  Using radial basis function on the general form of Chou's pseudo amino acid composition and PSSM to predict subcellular locations of proteins with both single and multiple sites , 2013, Biosyst..

[35]  K. Chou,et al.  Plant-mPLoc: A Top-Down Strategy to Augment the Power for Predicting Plant Protein Subcellular Localization , 2010, PloS one.