A Tri-Gram Based Feature Extraction Technique Using Linear Probabilities of Position Specific Scoring Matrix for Protein Fold Recognition

In biological sciences, the deciphering of a three dimensional structure of a protein sequence is considered to be an important and challenging task. The identification of protein folds from primary protein sequences is an intermediate step in discovering the three dimensional structure of a protein. This can be done by utilizing feature extraction technique to accurately extract all the relevant information followed by employing a suitable classifier to label an unknown protein. In the past, several feature extraction techniques have been developed but with limited recognition accuracy only. In this study, we have developed a feature extraction technique based on tri-grams computed directly from Position Specific Scoring Matrices. The effectiveness of the feature extraction technique has been shown on two benchmark datasets. The proposed technique exhibits up to 4.4% improvement in protein fold recognition accuracy compared to the state-of-the-art feature extraction techniques.

[1]  Inna Dubchak,et al.  Protein Folding Class Predictor for SCOP: Approach Based on Global Descriptors , 1997, ISMB.

[2]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[3]  K. Chou Prediction of protein cellular attributes using pseudo‐amino acid composition , 2001, Proteins.

[4]  Lukasz Kurgan,et al.  Determination of protein folding kinetic types using sequence and predicted secondary structure and solvent accessibility , 2010, Amino Acids.

[5]  Chengqi Zhang,et al.  Margin-based ensemble classifier for protein fold recognition , 2011, Expert Syst. Appl..

[6]  Konstantina S. Nikita,et al.  A comparative study of multi-classification methods for protein fold recognition , 2010, CI 2010.

[7]  Lukasz A. Kurgan,et al.  Secondary structure-based assignment of the protein structural classes , 2008, Amino Acids.

[8]  Abdollah Dehzangi,et al.  Solving protein fold prediction problem using fusion of heterogeneous classifiers , 2011 .

[9]  Katarzyna Stapor,et al.  A hybrid discriminative/generative approach to protein fold recognition , 2012, Neurocomputing.

[10]  Hampapathalu A. Nagarajaram,et al.  Support Vector Machine-based classification of protein folds using the structural properties of amino acid residues and amino acid residue pairs , 2007, Bioinform..

[11]  Kuo-Chen Chou,et al.  Prediction protein structural classes with pseudo-amino acid composition: approximate entropy and hydrophobicity pattern. , 2008, Journal of theoretical biology.

[12]  Chuan Yi Tang,et al.  Feature Selection and Combination Criteria for Improving Accuracy in Protein Structure Prediction , 2007, IEEE Transactions on NanoBioscience.

[13]  P. Klein,et al.  Prediction of protein structural class by discriminant analysis. , 1986, Biochimica et biophysica acta.

[14]  A Chinnasamy,et al.  Protein structure and fold prediction using tree-augmented naive Bayesian classifier. , 2004, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[15]  C. Kuo-chen,et al.  FoldRate: A Web-Server for Predicting Protein Folding Rates from Primary Sequence , 2009 .

[16]  Loris Nanni,et al.  An empirical study on the matrix-based protein representations and their combination with sequence-based approaches , 2012, Amino Acids.

[17]  L. Nanni,et al.  Protein classification using texture descriptors extracted from the protein backbone image. , 2010, Journal of theoretical biology.

[18]  Kuldip K. Paliwal,et al.  Cancer classification by gradient LDA technique using microarray gene expression data , 2008, Data Knowl. Eng..

[19]  Martin Vingron,et al.  Support Vector Machines for Protein Fold Class Prediction , 2003 .

[20]  Chris H. Q. Ding,et al.  Multi-class protein fold recognition using support vector machines and neural networks , 2001, Bioinform..

[21]  Abdollah Dehzangi,et al.  A Combination of Feature Extraction Methods with an Ensemble of Different Classifiers for Protein Structural Class Prediction Problem , 2013, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[22]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[23]  Babak Nadjar Araabi,et al.  A protein fold classifier formed by fusing different modes of pseudo amino acid composition via PSSM , 2011, Comput. Biol. Chem..

[24]  Satoru Miyano,et al.  A Top-r Feature Selection Algorithm for Microarray Gene Expression Data , 2012, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[25]  Somnuk Phon-Amnuaisuk,et al.  Protein Fold Prediction Problem Using Ensemble of Classifiers , 2009, ICONIP.

[26]  Kuldip K. Paliwal,et al.  Enhancing Protein Fold Prediction Accuracy Using Evolutionary and Structural Features , 2013, PRIB.

[27]  Kaizhu Huang,et al.  Enhanced protein fold recognition through a novel data integration approach , 2009, BMC Bioinformatics.

[28]  Kuldip K. Paliwal,et al.  A strategy to select suitable physicochemical attributes of amino acids for protein fold recognition , 2013, BMC Bioinformatics.

[29]  Berrin A. Yanikoglu,et al.  Protein Structural Class Determination Using Support Vector Machines , 2004, ISCIS.

[30]  Abdollah Dehzangi,et al.  Fold prediction problem: the application of new physical and physicochemical-based features. , 2011, Protein and peptide letters.

[31]  Jiangning Song,et al.  Prediction of protein folding rates from primary sequence by fusing multiple sequential features , 2009 .

[32]  Kuldip K. Paliwal,et al.  A Segmentation-Based Method to Extract Structural and Evolutionary Features for Protein Fold Recognition , 2014, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[33]  Chandan K. Reddy,et al.  Boosting Methods for Protein Fold Recognition: An Empirical Comparison , 2008, 2008 IEEE International Conference on Bioinformatics and Biomedicine.

[34]  Yuehui Chen,et al.  Ensemble of Probabilistic Neural Networks for Protein Fold Recognition , 2007, 2007 IEEE 7th International Symposium on BioInformatics and BioEngineering.

[35]  Shuigeng Zhou,et al.  A new taxonomy-based protein fold recognition approach based on autocross-covariance transformation , 2009, Bioinform..

[36]  Yongsheng Ding,et al.  Using Chou's pseudo amino acid composition to predict subcellular localization of apoptosis proteins: An approach with immune genetic algorithm-based ensemble classifier , 2008, Pattern Recognit. Lett..

[37]  Kuo-Chen Chou,et al.  Ensemble classifier for protein fold pattern recognition , 2006, Bioinform..

[38]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[39]  P. Deschavanne,et al.  Enhanced protein fold recognition using a structural alphabet , 2009, Proteins.

[40]  Somnuk Phon-Amnuaisuk,et al.  Enhancing Protein Fold Prediction Accuracy Using an Ensemble of Different Classifiers , 2009, Aust. J. Intell. Inf. Process. Syst..

[41]  Zheng Yuan,et al.  How good is prediction of protein structural class by the component‐coupled method? , 2000, Proteins.

[42]  Satoru Miyano,et al.  Null space based feature selection method for gene expression data , 2012, Int. J. Mach. Learn. Cybern..

[43]  Vojislav Kecman,et al.  Protein fold recognition with adaptive local hyperplane algorithm , 2009, 2009 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology.

[44]  Jitao Huang,et al.  Amino acid sequence predicts folding rate for middle‐size two‐state proteins , 2006, Proteins.

[45]  Rafael Najmanovich,et al.  Side‐chain flexibility in proteins upon ligand binding , 2000, Proteins.

[46]  James G. Lyons,et al.  A feature extraction technique using bi-gram probabilities of position specific scoring matrix for protein fold recognition. , 2013, Journal of theoretical biology.

[47]  Djamel Bouchaffra,et al.  Protein Fold Recognition using a Structural Hidden Markov Model , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[48]  Y-h. Taguchi,et al.  Application of amino acid occurrence for discriminating different folding types of globular proteins , 2007, BMC Bioinformatics.

[49]  N.R. Pal,et al.  Prediction of Protein Folds: Extraction of New Features, Dimensionality Reduction, and Fusion of Heterogeneous Classifiers , 2009, IEEE Transactions on NanoBioscience.

[50]  Xiaoqi Zheng,et al.  Accurate prediction of protein structural class using auto covariance transformation of PSI-BLAST profiles , 2011, Amino Acids.