Variable predictive model based classification algorithm for effective separation of protein structural classes

Variable predictive model based class discrimination (VPMCD) algorithm is proposed as an effective protein secondary structure classification tool. The algorithm mathematically represents the characteristics amino acid interactions specific to each protein structure and exploits them further to distinguish different structures. The new concept and the VPMCD classifier are established using well-studied datasets containing four protein classes as benchmark. The protein samples selected from SCOP and PDB databases with varying homology (25-100%) and non-uniform distribution of class samples provide challenging classification problem. The performance of the new method is compared with advanced classification algorithms like component coupled, SVM and neural networks. VPMCD provides superior performance for high homology datasets. 100% classification is achieved for self-consistency test and an improvement of 5% prediction accuracy is obtained during Jackknife test. The sensitivity of the new algorithm is investigated by varying model structures/types and sequence homology. Simpler to implement VPMCD algorithm is observed to be a robust classification technique and shows potential for effective extensions to other clinical diagnosis and data mining applications in biological systems.