ProClusEnsem: Predicting membrane protein types by fusing different modes of pseudo amino acid composition

Knowing the type of an uncharacterized membrane protein often provides a useful clue in both basic research and drug discovery. With the explosion of protein sequences generated in the post genomic era, determination of membrane protein types by experimental methods is expensive and time consuming. It therefore becomes important to develop an automated method to find the possible types of membrane proteins. In view of this, various computational membrane protein prediction methods have been proposed. They extract protein feature vectors, such as PseAAC (pseudo amino acid composition) and PsePSSM (pseudo position-specific scoring matrix) for representation of protein sequence, and then learn a distance metric for the KNN (K nearest neighbor) or NN (nearest neighbor) classifier to predicate the final type. Most of the metrics are learned using linear dimensionality reduction algorithms like Principle Components Analysis (PCA) and Linear Discriminant Analysis (LDA). Such metrics are common to all the proteins in the dataset. In fact, they assume that the proteins lie on a uniform distribution, which can be captured by the linear dimensionality reduction algorithm. We doubt this assumption, and learn local metrics which are optimized for local subset of the whole proteins. The learning procedure is iterated with the protein clustering. Then a novel ensemble distance metric is given by combining the local metrics through Tikhonov regularization. The experimental results on a benchmark dataset demonstrate the feasibility and effectiveness of the proposed algorithm named ProClusEnsem.

[1]  Ying Huang,et al.  Prediction of protein subcellular locations using fuzzy k-NN method , 2004, Bioinform..

[2]  Xuhui Chen,et al.  Predicting Membrane Protein Types with Dimensionality Reduction and Kernel Method , 2009, 2009 Sixth International Conference on Fuzzy Systems and Knowledge Discovery.

[3]  Yongping Li,et al.  Sequential Linear Neighborhood Propagation for Semi-Supervised protein Function Prediction , 2011, J. Bioinform. Comput. Biol..

[4]  K. Chou Prediction of protein cellular attributes using pseudo‐amino acid composition , 2001, Proteins.

[5]  Zhanchao Li,et al.  Using Chou's amphiphilic pseudo-amino acid composition and support vector machine for prediction of enzyme subfamily classes. , 2007, Journal of theoretical biology.

[6]  Hassan Mohabatkar,et al.  Prediction of cyclin proteins using Chou's pseudo amino acid composition. , 2010, Protein and peptide letters.

[7]  M. Esmaeili,et al.  Using the concept of Chou's pseudo amino acid composition for risk type prediction of human papillomaviruses. , 2010, Journal of theoretical biology.

[8]  Jingyan Wang,et al.  Multi-modal biometric authentication fusing iris and palmprint based on GMM , 2009, 2009 IEEE/SP 15th Workshop on Statistical Signal Processing.

[9]  Fengxi Song,et al.  A novel local preserving projection scheme for use with face recognition , 2010, Expert Syst. Appl..

[10]  Zhan-Ting Yuan,et al.  PCA and KPCA for Predicting Membrane Protein Types , 2009, 2009 WRI Global Congress on Intelligent Systems.

[11]  A. Esmaeili,et al.  Prediction of GABAA receptor proteins using the concept of Chou's pseudo-amino acid composition and support vector machine. , 2011, Journal of theoretical biology.

[12]  MuDer Jeng,et al.  Fast agglomerative clustering using information of k-nearest neighbors , 2010, Pattern Recognit..

[13]  K. Chou,et al.  REVIEW : Recent advances in developing web-servers for predicting protein attributes , 2009 .

[14]  Ying Zhang,et al.  Notice of Violation of IEEE Publication PrinciplesBag-of-Features Based Medical Image Retrieval via Multiple Assignment and Visual Words Weighting , 2011, IEEE Transactions on Medical Imaging.

[15]  Keinosuke Fukunaga,et al.  Introduction to Statistical Pattern Recognition , 1972 .

[16]  Xiao-ming Hu,et al.  Geometry preserving projections algorithm for predicting membrane protein types. , 2010, Journal of theoretical biology.

[17]  David Masip,et al.  Geometry-Based Ensembles: Toward a Structural Characterization of the Classification Boundary , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Ying Zhang,et al.  Learning context-sensitive similarity by shortest path propagation , 2011, Pattern Recognit..

[19]  Ying Zhang,et al.  Boosted Learning of Visual Word Weighting Factors for Bag-of-Features Based Medical Image Retrieval , 2011, 2011 Sixth International Conference on Image and Graphics.

[20]  K Nishikawa,et al.  The folding type of a protein is relevant to the amino acid composition. , 1986, Journal of biochemistry.

[21]  K. Chou,et al.  Recent progress in protein subcellular location prediction. , 2007, Analytical biochemistry.

[22]  Menglong Li,et al.  SecretP: identifying bacterial secreted proteins by fusing new features into Chou's pseudo-amino acid composition. , 2010, Journal of theoretical biology.

[23]  Guohui Zhang,et al.  An effective multi-biometrics solution for embedded device , 2009, 2009 IEEE International Conference on Systems, Man and Cybernetics.

[24]  Deva Ramanan,et al.  Local distance functions: A taxonomy, new algorithms, and an evaluation , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[25]  Deva Ramanan,et al.  Local Distance Functions: A Taxonomy, New Algorithms, and an Evaluation , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Ali El-Zaart Expectation-maximization technique for fibro-glandular discs detection in mammography images , 2010, Comput. Biol. Medicine.

[27]  Meng Wang,et al.  SLLE for predicting membrane protein types. , 2005, Journal of theoretical biology.

[28]  K. Chou,et al.  Prediction of protein structural classes. , 1995, Critical reviews in biochemistry and molecular biology.

[29]  Kuo-Chen Chou,et al.  Predicting membrane protein types by the LLDA algorithm. , 2008, Protein and peptide letters.

[30]  Yanzhi Guo,et al.  Using the augmented Chou's pseudo amino acid composition for predicting protein submitochondria locations based on auto covariance approach. , 2009, Journal of theoretical biology.

[31]  K. Chou Pseudo Amino Acid Composition and its Applications in Bioinformatics, Proteomics and System Biology , 2009 .

[32]  Kun-Lin Hsieh,et al.  Incorporating PCA and fuzzy-ART techniques into achieve organism classification based on codon usage consideration , 2008, Comput. Biol. Medicine.

[33]  Kuo-Chen Chou,et al.  A Multi-Label Classifier for Predicting the Subcellular Localization of Gram-Negative Bacterial Proteins with Both Single and Multiple Sites , 2011, PloS one.

[34]  K. Chou,et al.  Cell-PLoc: a package of Web servers for predicting subcellular localization of proteins in various organisms , 2008, Nature Protocols.

[35]  Hao Lin,et al.  Prediction of cell wall lytic enzymes using Chou's amphiphilic pseudo amino acid composition. , 2009, Protein and peptide letters.

[36]  Bangti Jin,et al.  A new approach to nonlinear constrained Tikhonov regularization , 2011, 1109.0654.

[37]  P. Aloy,et al.  Relation between amino acid composition and cellular location of proteins. , 1997, Journal of molecular biology.

[38]  J. Nieto,et al.  Use of fuzzy clustering technique and matrices to classify amino acids and its impact to Chou's pseudo amino acid composition. , 2009, Journal of theoretical biology.

[39]  Q Gu,et al.  Prediction of G-protein-coupled receptor classes in low homology using Chou's pseudo amino acid composition with approximate entropy and hydrophobicity patterns. , 2010, Protein and peptide letters.

[40]  Elena Marchiori,et al.  Iterated Large-margin Discriminant Analysis for Feature Dimensionality Reduction in Medical Image Retrieval , 2011 .

[41]  Ying Zhang,et al.  Class Conditional Distance Metric for 3D Protein Structure Classification , 2011, 2011 5th International Conference on Bioinformatics and Biomedical Engineering.

[42]  Jonathan M. Nichols,et al.  Automated, rapid classification of signals using locally linear embedding , 2011, Expert Syst. Appl..

[43]  Kuo-Chen Chou,et al.  MemType-2L: a web server for predicting membrane proteins and their types by incorporating evolution information through Pse-PSSM. , 2007, Biochemical and biophysical research communications.

[44]  K. Chou A novel approach to predicting protein structural classes in a (20–1)‐D amino acid composition space , 1995, Proteins.

[45]  Hong Gu,et al.  Robust prediction of protein subcellular localization combining PCA and WSVMs , 2011, Comput. Biol. Medicine.

[46]  Guangya Zhang,et al.  Predicting the cofactors of oxidoreductases based on amino acid composition distribution and Chou's amphiphilic pseudo-amino acid composition. , 2008, Journal of theoretical biology.

[47]  L. Muu,et al.  The Tikhonov regularization extended to equilibrium problems involving pseudomonotone bifunctions , 2011 .

[48]  Xiaoyong Zou,et al.  Prediction of protein secondary structure content by using the concept of Chou's pseudo amino acid composition and support vector machine. , 2009, Protein and peptide letters.

[49]  Sylvain Arlot,et al.  Segmentation of the mean of heteroscedastic data via cross-validation , 2009, Stat. Comput..

[50]  Chao Wang,et al.  How to handle missing data in robust multi-biometrics verification , 2011, Int. J. Biom..

[51]  Sinisa Todorovic,et al.  Local-Learning-Based Feature Selection for High-Dimensional Data Analysis , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[52]  K. Chou Some remarks on protein attribute prediction and pseudo amino acid composition , 2010, Journal of Theoretical Biology.

[53]  Hao Lin The modified Mahalanobis Discriminant for predicting outer membrane proteins by using Chou's pseudo amino acid composition. , 2008, Journal of theoretical biology.