Prediction of Protein Quaternary Structure by a Novel Manifold Learning Algorithm

With the explosion of protein sequences generated in the Post-Genomic Age, it is urgent to develop an automated method to predict protein quaternary structure. To explore this problem, we adopted an approach based on a sequence encoding descriptor by fusing PseAA (Pseudo Amino Acid) and DC (Dipeptide Composition) representing a protein sample. Here, a completely different approach, manifold learning algorithm MVP (Maximum variance projection) is introduced to extract the key features from the high-dimensional feature space. The dimension-reduced descriptor vector thus obtained is a compact representation of the original high dimensional vector. Our jackknife test results indicate that it is very promising to use the dimensionality reduction approaches to cope with complicated problems in biological systems, such as predicting the quaternary structure of proteins.