Predicting Hepatoma-Related Genes Based on Representation Learning of PPI network and Gene Ontology Annotations

Hepatoma is the most common type of primary liver cancer with a high mortality rate in the world. The genetic causes of the disease pathology remain largely unknown. Effective discovery of the genes associated with hepatoma has become important in disease prevention, early diagnosis, and therapeutic treatments. With the developments of molecular networks, graph-based methods have been tremendously successful in predicting disease genes based on the hypothesis of guilt-by-association. Network representation learning (NRL) techniques have accelerated disease gene discovery in recent years because of their powerful network feature extraction ability. However, the current network representation learning-based methods for disease gene discovery did not consider the gene features derived from gene ontology annotations, which apriori group genes with similar functions. To fill this gap, here we propose a novel framework to predict hepatoma-related genes based on representation learning from both protein-protein interactions (PPI) network and gene ontology annotations. Our framework has three steps: learning features from PPI network and gene ontologies using NRL techniques, integrating different features based on autoencoder, predicting hepatoma-related genes using machine learning classifiers. Experiments have demonstrated that our framework could accurately predict hepatoma-related genes with AUROC and AUPRC reaching 0.93 and 0.94, respectively. Compared with other methods using only single representation features, our framework also shows superior performance on hepatoma gene prediction.