Use Chou’s 5-Steps Rule to Predict Remote Homology Proteins by Merging Grey Incidence Analysis and Domain Similarity Analysis

Detecting remote homology proteins is a challenging problem for both basic research and drug development. Although there are a couple of methods to deal with this problem, the benchmark datasets based on which the existing methods were trained and tested contain many high homologous samples as reflected by the fact that the cutoff threshold was set at 95%. In this study, we reconstructed the benchmark dataset by setting the threshold at 40%, meaning none of the proteins included in the benchmark dataset has more than 40% pairwise sequence identity with any other in the same subset. Using the new benchmark dataset, we proposed a new predictor called “dRHP-GreyFun” based on the grey modeling and functional domain approach. Rigorous cross-validations have indicated that the new predictor is superior to its counterparts in both enhancing success rates and reducing computational cost. The predictor can be downloaded from https://github.com/jcilwz/dRHP-GreyFun.