Prior Knowledge Guided Gene-Disease Associations Prediction: An Enhanced Inductive Matrix Completion Approach

Exploring gene-disease associations is of great significance for early prevention, diagnosis and treatment of diseases. Most existing methods depend on specific type of biological evidence and thus are limited in the application. More importantly, these methods ignore some inherent prior sparsity and structure knowledge which is useful for predicting gene-disease associations. To address these challenges, a novel Enhanced Inductive Matrix Completion (EIMC) model is proposed to predict pathogenic genes by introducing the prior sparsity and structure knowledge into the traditional Inductive Matrix Completion (IMC). Specifically, the EIMC model not only employs the sparse regularization to preserve the prior sparsity of gene-disease associations, but also employs the manifold regularization to capture the prior structure information of data distribution. To the best of our knowledge, the proposed EIMC is the first model to simultaneously incorporate both prior sparse and manifold regularizations into the same objective function. Additionally, note that our proposed EIMC model also integrates the features of genes and diseases extracted from various types of biological data, and can predict new genes and diseases by using an inductive learning strategy. Finally, the extensive experimental results demonstrate that our proposed model outperforms other state-of-the-art methods.

[1]  Inderjit S. Dhillon,et al.  Provable Inductive Matrix Completion , 2013, ArXiv.

[2]  Michael Q. Zhang,et al.  Network-based global inference of human disease genes , 2008, Molecular systems biology.

[3]  Bingsheng He,et al.  On the O(1/n) Convergence Rate of the Douglas-Rachford Alternating Direction Method , 2012, SIAM J. Numer. Anal..

[4]  Jagdish Chandra Patra,et al.  Genome-wide inferring gene-phenotype relationship by walking on the heterogeneous network , 2010, Bioinform..

[5]  Patrick L. Combettes,et al.  Signal Recovery by Proximal Forward-Backward Splitting , 2005, Multiscale Model. Simul..

[6]  Geng Yang,et al.  Correlation consistency constrained matrix completion for web service tag refinement , 2014, Neural Computing and Applications.

[7]  Emmanuel J. Candès,et al.  A Singular Value Thresholding Algorithm for Matrix Completion , 2008, SIAM J. Optim..

[8]  Alan F. Scott,et al.  Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders , 2004, Nucleic Acids Res..

[9]  Nagarajan Natarajan,et al.  Inductive matrix completion for predicting gene–disease associations , 2014, Bioinform..

[10]  Petter Holme,et al.  Ranking Candidate Disease Genes from Gene Expression and Protein Interaction: A Katz-Centrality Based Approach , 2011, PloS one.

[11]  John O. Woods,et al.  Prediction and Validation of Gene-Disease Associations Using Methods Inspired by Social Network Analyses , 2013, PloS one.

[12]  K. Becker,et al.  The Genetic Association Database , 2004, Nature Genetics.

[13]  Stephen P. Boyd,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..