Detecting
remote homology proteins is a challenging
problem for both basic research and drug development. Although there are a
couple of methods to deal with this problem, the benchmark datasets based on
which the existing methods were trained and tested contain many high homologous
samples as reflected by the fact that the cutoff threshold was set at 95%. In
this study, we reconstructed the benchmark dataset by setting the threshold at
40%, meaning none of the proteins included in the benchmark dataset has more
than 40% pairwise sequence identity with any other in the same subset. Using
the new benchmark dataset, we proposed a new predictor called “dRHP-GreyFun”
based on the grey modeling and functional domain approach. Rigorous
cross-validations have indicated that the new predictor is superior to its
counterparts in both enhancing success rates and reducing computational cost.
The predictor can be downloaded from https://github.com/jcilwz/dRHP-GreyFun.