Candidate gene prioritization using graph embedding

Candidate genes prioritization allows to rank among a large number of genes, those that are strongly associated with a phenotype or a disease. Due to the important amount of data that needs to be integrate and analyse, gene-to-phenotype association is still a challenging task. In this paper, we evaluated a knowledge graph approach combined with embedding methods to overcome these challenges. We first introduced a dataset of rice genes created from several open-access databases. Then, we used the Translating Embedding model and Convolution Knowledge Base model, to vectorize gene information. Finally, we evaluated the results using link prediction performance and vectors representation using some unsupervised learning techniques.

[1]  Yves Moreau,et al.  Candidate gene prioritization with Endeavour , 2016, Nucleic Acids Res..

[2]  Jason Weston,et al.  Translating Embeddings for Modeling Multi-relational Data , 2013, NIPS.

[3]  Jason Weston,et al.  Learning Structured Embeddings of Knowledge Bases , 2011, AAAI.

[4]  Pasquale Minervini,et al.  Convolutional 2D Knowledge Graph Embeddings , 2017, AAAI.

[5]  Dai Quoc Nguyen,et al.  A Novel Embedding Model for Knowledge Base Completion Based on Convolutional Neural Network , 2017, NAACL.

[6]  Y. Ouyang,et al.  funRiceGenes dataset for comprehensive understanding and application of rice functional genes , 2017, GigaScience.

[7]  Evgeniy Gabrilovich,et al.  A Review of Relational Machine Learning for Knowledge Graphs , 2015, Proceedings of the IEEE.

[8]  Hua Wu,et al.  An End-to-End Model for Question Answering over Knowledge Base with Cross-Attention Combining Global Knowledge , 2017, ACL.

[9]  Jianfeng Gao,et al.  Embedding Entities and Relations for Learning and Inference in Knowledge Bases , 2014, ICLR.

[10]  Hongyu Zhao,et al.  A review of post-GWAS prioritization approaches , 2013, Front. Genet..

[11]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[12]  Yoshihiro Kawahara,et al.  Rice Annotation Project Database (RAP-DB): An Integrative and Interactive Database for Rice Genomics , 2013, Plant & cell physiology.

[13]  Zhiyuan Liu,et al.  Learning Entity and Relation Embeddings for Knowledge Graph Completion , 2015, AAAI.

[14]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[15]  Zhen Wang,et al.  Knowledge Graph Embedding by Translating on Hyperplanes , 2014, AAAI.

[16]  Y. Yamazaki,et al.  Oryzabase. An Integrated Biological and Genome Information Database for Rice1[OA] , 2005, Plant Physiology.

[17]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[18]  Tom M. Mitchell,et al.  Leveraging Knowledge Bases in LSTMs for Improving Machine Reading , 2017, ACL.

[19]  Bo Wang,et al.  Gramene 2018: unifying comparative genomics and pathway resources for plant research , 2017, Nucleic Acids Res..

[20]  James P. Callan,et al.  Explicit Semantic Ranking for Academic Search via Knowledge Graph Embedding , 2017, WWW.

[21]  Quan Do,et al.  PyRice: a Python package for querying Oryza Sativa databases , 2020 .

[22]  The UniProt Consortium,et al.  UniProt: a worldwide hub of protein knowledge , 2018, Nucleic Acids Res..

[23]  Yves Moreau,et al.  pBRIT: gene prioritization by correlating functional and phenotypic annotations through integrative data fusion , 2018, Bioinform..

[24]  Guillaume Bouchard,et al.  Complex Embeddings for Simple Link Prediction , 2016, ICML.

[25]  R. Srinivasan,et al.  Phenotype-driven gene prioritization for rare diseases using graph convolution on heterogeneous networks , 2018, BMC Medical Genomics.

[26]  Zhang Zhang,et al.  Information Commons for Rice (IC4R) , 2015, Nucleic Acids Res..

[27]  Gerhard Weikum,et al.  NAGA: Searching and Ranking Knowledge , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[28]  Y. Moreau,et al.  Computational tools for prioritizing candidate genes: boosting disease gene discovery , 2012, Nature Reviews Genetics.

[29]  Ming Chen,et al.  PRIN: a predicted rice interactome network , 2011, BMC Bioinformatics.

[30]  Jason Weston,et al.  Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[31]  Dai Quoc Nguyen,et al.  A Capsule Network-based Embedding Model for Knowledge Graph Completion and Search Personalization , 2018, NAACL.

[32]  D. Schwartz,et al.  Improvement of the Oryza sativa Nipponbare reference genome using next generation sequence and optical map data , 2013, Rice.

[33]  Inna Dubchak,et al.  Rice SNP-seek database update: new SNPs, indels, and queries , 2016, Nucleic Acids Res..

[34]  Miguel Pérez-Enciso,et al.  A Guide on Deep Learning for Complex Trait Genomic Prediction , 2019, Genes.

[35]  Rahul Gupta,et al.  Knowledge base completion via search-based question answering , 2014, WWW.