frDriver: A Functional Region Driver Identification for Protein Sequence

Identifying cancer drivers is a crucial challenge to explain the underlying mechanisms of cancer development. There are many methods to identify cancer drivers based on the single mutation site or the entire gene. But they ignore a large number of functional elements with medium in size. It is hypothesized that mutations occurring in different regions of the protein sequence have different effects on the progression of cancer. Here, we develop a novel functional region driver(frDriver) identification method based on Bayesian probability and multiple linear regression models to identify protein regions that can regulate gene expression levels and have high functional impact potential. Combining gene expression data and somatic mutation data, with functional impact scores(SIFT, PROVEAN) as a priori knowledge, we identified cancer driver regions that are most accurate in predicting gene expression levels. We evaluated the performance of frDriver on the BRCA and GBM datasets from TCGA. The results showed that frDriver identified known cancer drivers and outperformed the other three state-of-the-art methods(eDriver, ActiveDriver and OncodriveCLUST). In addition, we performed KEGG pathway and GO term enrichment analysis, and the results indicated that the cancer drivers predicted by frDriver were related to processes such as cancer formation and gene regulation.