Support vector machine for discrimination of thermophilic and mesophilic proteins based on amino acid composition.
暂无分享,去创建一个
The identification of the thermostability from the amino acid sequence information would be helpful in computational screening for thermostable proteins. We have developed a method to discriminate thermophilic and mesophilic proteins based on support vector machines. Using self-consistency validation, 5-fold cross-validation and independent testing procedure with other datasets, this module achieved overall accuracy of 94.2%, 90.5% and 92.4%, respectively. The performance of this SVM-based module was better than the classifiers built using alternative machine learning and statistical algorithms including artificial neural networks, Bayesian statistics, and decision trees, when evaluated using these three validation methods. The influence of protein size on prediction accuracy was also addressed.