An Ensemble Framework for Improving the Prediction of Deleterious Synonymous Mutation

In recent years, the association between synonymous mutations (SMs) and human diseases has been uncovered in many studies. It is a challenge for identifying deleterious SMs in the field of medical genomics. Although there are several computational methods proposed in the past years, the precise prediction of deleterious SMs is still challenging. In this work, we proposed a predictor named as EnDSM, which is an accurate method based on the ensemble framework. We explored multimodal features across four groups including functional score, conservation, splicing, and sequence features, and we then trained eight conceptually different machine learning classifiers for each of them, resulting in 32 base classification models. We further selected four base models referring to their prediction performance and the predictive probabilities of these base classification models were subsequently used as the input feature vectors of logistic regression classifier to construct the ensemble learning model. The results suggested that EnDSM achieved better performance comparing with other state-of-the-art predictors on the training and independent test datasets. We anticipate that our ensemble predictor EnDSM will become a valuable tool for deleterious SM prediction. The EnDSM server interface along with the benchmarking data sets are freely available at http://bioinfo.ahu.edu.cn/EnDSM.