Evidence theoretic protein fold classification based on the concept of hyperfold.

In current computational biology, assigning a protein domain to a fold class is a complicated and controversial task. It can be more challenging in the much harder task of correct identification of protein domain fold pattern solely through using extracted information from protein sequence. To deal with such a challenging problem, the concepts of hyperfold and interlaced folds are introduced for the first time. Each hyperfold is a set of interlaced folds with a centroid fold. These concepts are used to construct a framework for handling the uncertainty involved with the fold classification problem. In this approach, an unknown query protein is assigned to a hyperfold rather than a single fold. Ten different sequence based features are used to predicting the correct hyperfold. This architecture is featured by the Dempster-Shafer theory of evidence through the bodies of evidence and Dempster's rule of combination to combine the hyperfolds. The classification architecture thus developed was applied for identifying protein folds among the 27 famous SCOP fold patterns from a stringent well-known dataset. Compared with the existing predictors tested by the same benchmark dataset, our approach might achieve the better results.

[1]  Andrew P. Bradley,et al.  The use of the area under the ROC curve in the evaluation of machine learning algorithms , 1997, Pattern Recognit..

[2]  Ronald R. Yager,et al.  Classic Works of the Dempster-Shafer Theory of Belief Functions , 2010, Classic Works of the Dempster-Shafer Theory of Belief Functions.

[3]  Gajendra Pal Singh Raghava,et al.  Prediction of β‐turns in proteins from multiple alignment using neural network , 2003, Protein science : a publication of the Protein Society.

[4]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[5]  M. K. Luhandjula Studies in Fuzziness and Soft Computing , 2013 .

[6]  Oliviero Carugo,et al.  Detailed estimation of bioinformatics prediction reliability through the Fragmented Prediction Performance Plots , 2007, BMC Bioinformatics.

[7]  K. Chou,et al.  Predicting protein fold pattern with functional domain and sequential evolution information. , 2009, Journal of theoretical biology.

[8]  Arthur P. Dempster,et al.  Upper and Lower Probabilities Induced by a Multivalued Mapping , 1967, Classic Works of the Dempster-Shafer Theory of Belief Functions.

[9]  Mamoon Rashid,et al.  Support Vector Machine-based method for predicting subcellular localization of mycobacterial proteins using evolutionary information and motifs , 2007, BMC Bioinformatics.

[10]  I. Muchnik,et al.  Recognition of a protein fold in the context of the Structural Classification of Proteins (SCOP) classification. , 1999, Proteins.

[11]  C. Sander,et al.  The FSSP database of structurally aligned protein fold families. , 1994, Nucleic acids research.

[12]  Lukasz A. Kurgan,et al.  PFRES: protein fold classification by using evolutionary information and predicted secondary structure , 2007, Bioinform..

[13]  K. R. Woods,et al.  Prediction of protein antigenic determinants from amino acid sequences. , 1981, Proceedings of the National Academy of Sciences of the United States of America.

[14]  Chris Sander,et al.  The FSSP database: fold classification based on structure-structure alignment of proteins , 1996, Nucleic Acids Res..

[15]  Thomas L. Madden,et al.  Improving the accuracy of PSI-BLAST protein database searches with composition-based statistics and other refinements. , 2001, Nucleic acids research.

[16]  C. Metz Basic principles of ROC analysis. , 1978, Seminars in nuclear medicine.

[17]  1 - Application du modèle des croyances transférables en reconnaissance de formes , 1997 .

[18]  B. Matthews Comparison of the predicted and observed secondary structure of T4 phage lysozyme. , 1975, Biochimica et biophysica acta.

[19]  Kuo-Chen Chou,et al.  Using amphiphilic pseudo amino acid composition to predict enzyme subfamily classes , 2005, Bioinform..

[20]  Babak Nadjar Araabi,et al.  A protein fold classifier formed by fusing different modes of pseudo amino acid composition via PSSM , 2011, Comput. Biol. Chem..

[21]  H. Scheraga,et al.  Experimental and theoretical aspects of protein folding. , 1975, Advances in protein chemistry.

[22]  Glenn Shafer,et al.  A Mathematical Theory of Evidence , 2020, A Mathematical Theory of Evidence.

[23]  Kuo-Chen Chou,et al.  Prediction of protein structure classes with pseudo amino acid composition and fuzzy support vector machine network. , 2007, Protein and peptide letters.

[24]  C. Tanford Contribution of Hydrophobic Interactions to the Stability of the Globular Conformation of Proteins , 1962 .

[25]  Ao Li,et al.  LOCSVMPSI: a web server for subcellular localization of eukaryotic proteins using SVM and profile of PSI-BLAST , 2005, Nucleic Acids Res..

[26]  C. Anfinsen Principles that govern the folding of protein chains. , 1973, Science.

[27]  Infotech Oulu,et al.  Protein Fold Recognition with K-Local Hyperplane Distance Nearest Neighbor Algorithm , 2004 .

[28]  Chris H. Q. Ding,et al.  Multi-class protein fold recognition using support vector machines and neural networks , 2001, Bioinform..

[29]  A G Murzin,et al.  SCOP: a structural classification of proteins database for the investigation of sequences and structures. , 1995, Journal of molecular biology.

[30]  Pierre Baldi,et al.  Assessing the accuracy of prediction algorithms for classification: an overview , 2000, Bioinform..

[31]  I. Muchnik,et al.  Recognition of a protein fold in the context of the SCOP classification , 1999 .

[32]  David C. Jones,et al.  CATH--a hierarchic classification of protein domain structures. , 1997, Structure.

[33]  Jonathan M. Garibaldi,et al.  Supervised machine learning algorithms for protein structure classification , 2009, Comput. Biol. Chem..

[34]  I. Muchnik,et al.  Prediction of protein folding class using global description of amino acid sequence. , 1995, Proceedings of the National Academy of Sciences of the United States of America.

[35]  Subhash C. Bagui,et al.  Combining Pattern Classifiers: Methods and Algorithms , 2005, Technometrics.

[36]  Gajendra P. S. Raghava,et al.  A neural‐network based method for prediction of γ‐turns in proteins from multiple sequence alignment , 2003, Protein science : a publication of the Protein Society.

[37]  Chin-Teng Lin,et al.  Recognition of Structure Classification of Protein Folding by NN and SVM Hierarchical Learning Architecture , 2003, ICANN.

[38]  K. Chou Prediction of protein cellular attributes using pseudo‐amino acid composition , 2001, Proteins.

[39]  C. Branden,et al.  Introduction to protein structure , 1991 .

[40]  Thierry Denoeux,et al.  An evidence-theoretic k-NN rule with parameter optimization , 1998, IEEE Trans. Syst. Man Cybern. Part C.

[41]  Kuo-Chen Chou,et al.  Ensemble classifier for protein fold pattern recognition , 2006, Bioinform..

[42]  Loris Nanni A novel ensemble of classifiers for protein fold recognition , 2006, Neurocomputing.

[43]  Thierry Denoeux,et al.  A k-nearest neighbor classification rule based on Dempster-Shafer theory , 1995, IEEE Trans. Syst. Man Cybern..