Preditcing protein subcellular location by AdaBoost.M1 algorithm

In bioinformatics fields, Predicting protein subcellular location is an important task, because protein has to be located in its proper position in a cell to perform its biological functions. Therefore, predicting protein location is an important and challenging task in current molecular and cellular biology. In this paper, a computational method based AdaBoost.M1 algorithm and pseudo amino acids composition (PseAAC) to identify protein subcellular location. AdaBoost.M1, an improved algorithm directly extends the original AdaBoost algorithm to the multi-class case without reducing it to multiple two-class problems, is applied to predict the protein subcellular location. In some previous studies conventional amino acid composition is applied to represent a protein. In order to take into account sequence order effects, in this study we use PseAAC that was proposed by Chou instead of convention amino acids composition to represent a protein. To demonstrate AdaBoost.M1 is a robust and efficient model in predicting location, the same protein dataset that was used cedano et al. in 1997 is adopted by us in this paper. From the result, we can draw a conclusion that the accuracy of this method is outperformed than other methods used by previous researchers and can make the prediction into practice.

[1]  Kuo-Chen Chou,et al.  Predicting protein structural class with AdaBoost Learner. , 2006, Protein and peptide letters.

[2]  K Nishikawa,et al.  Discrimination of intracellular and extracellular proteins using amino acid composition and residue-pair frequencies. , 1994, Journal of molecular biology.

[3]  P. Aloy,et al.  Relation between amino acid composition and cellular location of proteins. , 1997, Journal of molecular biology.

[4]  Rolf Apweiler,et al.  The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000 , 2000, Nucleic Acids Res..

[5]  P. Y. Chou,et al.  Prediction of Protein Structural Classes from Amino Acid Compositions , 1989 .

[6]  H.-B. Shen,et al.  Using ensemble classifier to identify membrane protein types , 2006, Amino Acids.

[7]  C. Zhang,et al.  A joint prediction of the folding types of 1490 human proteins from their genetic codons. , 1993, Journal of theoretical biology.

[8]  K. Chou Prediction of protein cellular attributes using pseudo‐amino acid composition , 2001, Proteins.

[9]  B. Matthews Comparison of the predicted and observed secondary structure of T4 phage lysozyme. , 1975, Biochimica et biophysica acta.

[10]  Hao Lin,et al.  Predicting subcellular localization of mycobacterial proteins by using Chou's pseudo amino acid composition. , 2008, Protein and peptide letters.

[11]  K. Chou Progress in protein structural class prediction and its impact to bioinformatics and proteomics. , 2005, Current protein & peptide science.

[12]  G. Fasman Prediction of Protein Structure and the Principles of Protein Conformation , 2012, Springer US.

[13]  K. Chou,et al.  Prediction of protein structural classes. , 1995, Critical reviews in biochemistry and molecular biology.

[14]  K. Chou,et al.  Prediction of membrane protein types and subcellular locations , 1999, Proteins.

[15]  C. Zhang,et al.  Predicting protein folding types by distance functions that make allowances for amino acid interactions. , 1994, The Journal of biological chemistry.

[16]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.