Identification of gene signature associated with type 2 diabetes mellitus by integrating mutation and expression data*

Type 2 diabetes mellitus (T2DM) is a frequency occurred chronic disease. The early diagnosis could be very helpful for the treatment of T2DM patients. With the development of sequencing technology, a large number of differentially expressed genes were identified from expression data. However, the method of machine learning can only identify the local optimal solution as the signature. The mutation information obtained by inheritance can better reflect the relationship between genes and diseases. Therefore, we need to integrate mutation information to more accurately identify the signature. To this end, we integrated genome-wide association study (GWAS) data and expression data, combined with expression quantitative trait loci (eQTL) technology to get T2DM predictive signature (T2DMSig-10). Firstly, we used GWAS data to obtain a list of T2DM susceptible loci. Then, we used eQTL technology to locate risk single nucleotide polymorphisms (SNPs) to genes, and combined with the pancreatic $\beta-$ cells gene expression data to obtain 10 protein-coding genes. Next, we combined these genes with equal weights. After receiving receiver operating characteristic (ROC), single gene removal method, gene ontology function enrichment and protein-protein interaction network were used to verify, the results showed that T2DMSig-10 had an excellent predictive effect on T2DM (AUC =0.99), and was highly robust. In short, we obtained the predictive signature of T2DM, and further analyzed and verified it.