Analysis for Disease Gene Association Using Machine Learning

To recognize the basis of disease, it is essential to determine its underlying genes. Understanding the association between underlying genes and genetic disease is a fundamental problem regarding human health. Identification and association of genes with the disease require time consuming and expensive experimentations of a great number of potential candidate genes. Therefore, the alternative inexpensive and rapid computational methods have been proposed that can identify the candidate gene associated with a disease. Most of these methods use phenotypic similarities due to the fact that genes causing same or similar diseases have less variation in their sequence or network properties of protein-protein interactions based on-premises that genes lie closer in protein interaction network that causes the similar or same disease. However, these methods use only basic network properties or topological features and gene sequence information or biological features as a prior knowledge for identification of gene-disease association, which restricts the identification process to a single gene-disease association. In this study, we propose and analyze some novel computational methods for the identification of genes associated with diseases. Some advance topological and biological features that are overlooked currently are introducing for identifying candidate genes. We evaluate different computational methods on disease-gene association data from DisGeNET in a 10-fold cross-validation mode based on TP rate, FP rate, precision, recall, F-measure, and ROC curve evaluation parameters. The results reveal that various computational methods with advanced feature set outperform previous state-of-the-art techniques by achieving precision up to 93.8%, recall up to 93.1%, and F- measure up to 92.9%. Significantly, we apply our methods to study four major diseases: Thalassemia, Diabetes, Malaria, and Asthma. Simulation results show that the proposed Deep Extreme Learning Machine (DELM) gives more accurate results as compared to previously published approaches.

[1]  Pui-Yan Kwok,et al.  Prioritizing causal disease genes using unbiased genomic features , 2014, Genome Biology.

[2]  Yu Qian,et al.  Identifying disease associated genes by network propagation , 2014, BMC Systems Biology.

[3]  Nadra Guizani,et al.  IoMT-Based Association Rule Mining for the Prediction of Human Protein Complexes , 2020, IEEE Access.

[4]  Yongjin Li,et al.  Discovering disease-genes by topological features in human protein-protein interaction network , 2006, Bioinform..

[5]  M. Vidal,et al.  Selecting causal genes from genome-wide association studies via functionally coherent subnetworks , 2014, Nature Methods.

[6]  Hongyi Zhou,et al.  A knowledge-based approach for predicting gene-disease associations , 2016, Bioinform..

[7]  R. Piro,et al.  Computational approaches to disease‐gene prediction: rationale, classification and successes , 2012, The FEBS journal.

[8]  David J. Porteous,et al.  Speeding disease gene discovery by sequence based candidate prioritization , 2005, BMC Bioinformatics.

[9]  Y. Moreau,et al.  Computational tools for prioritizing candidate genes: boosting disease gene discovery , 2012, Nature Reviews Genetics.

[10]  Huiru Zheng,et al.  A computational framework for the prioritization of disease-gene candidates , 2015, BMC Genomics.

[11]  Shenghuo Zhu,et al.  A survey on wavelet applications in data mining , 2002, SKDD.

[12]  Mai S. Mabrouk,et al.  A Study of the Potential of EIIP Mapping Method in Exon Prediction Using the Frequency Domain Techniques , 2012 .

[13]  Ahmad Almogren,et al.  An automated and intelligent Parkinson disease monitoring system using wearable computing and cloud technology , 2018, Cluster Computing.

[14]  Jagdish Chandra Patra,et al.  Genome-wide inferring gene-phenotype relationship by walking on the heterogeneous network , 2010, Bioinform..

[15]  Dale H. Mugler,et al.  A gene selection method for classifying cancer samples using 1D discrete wavelet transform , 2009, Int. J. Comput. Biol. Drug Des..

[16]  Mohsen Guizani,et al.  A Decade of Internet of Things: Analysis in the Light of Healthcare Applications , 2019, IEEE Access.

[17]  Roded Sharan,et al.  Associating Genes and Protein Complexes with Disease via Network Propagation , 2010, PLoS Comput. Biol..

[18]  Carl Kingsford,et al.  The power of protein interaction networks for associating genes with diseases , 2010, Bioinform..

[19]  Zoe L. Jiang,et al.  Decision Tree Based Approaches for Detecting Protein Complex in Protein Protein Interaction Network (PPI) via Link and Sequence Analysis , 2018, IEEE Access.

[20]  P. Robinson,et al.  Walking the interactome for prioritization of candidate disease genes. , 2008, American journal of human genetics.

[21]  Michael Q. Zhang,et al.  Network-based global inference of human disease genes , 2008, Molecular systems biology.

[22]  Tahir Alyas,et al.  DNA Pattern Analysis using Finite Automata , 2014 .

[23]  Jing Chen,et al.  Disease candidate gene identification and prioritization using protein interaction networks , 2009, BMC Bioinformatics.