Transfer learning across ontologies for phenome‐genome association prediction

Motivation: To better predict and analyze gene associations with the collection of phenotypes organized in a phenotype ontology, it is crucial to effectively model the hierarchical structure among the phenotypes in the ontology and leverage the sparse known associations with additional training information. In this paper, we first introduce Dual Label Propagation (DLP) to impose consistent associations with the entire phenotype paths in predicting phenotype‐gene associations in Human Phenotype Ontology (HPO). DLP is then used as the base model in a transfer learning framework (tlDLP) to incorporate functional annotations in Gene Ontology (GO). By simultaneously reconstructing GO term‐gene associations and HPO phenotype‐gene associations for all the genes in a protein‐protein interaction network, tlDLP benefits from the enriched training associations indirectly through relation with GO terms. Results: In the experiments to predict the associations between human genes and phenotypes in HPO based on human protein‐protein interaction network, both DLP and tlDLP improved the prediction of gene associations with phenotype paths in HPO in cross‐validation and the prediction of the most recent associations added after the snapshot of the training data. Moreover, the transfer learning through GO term‐gene associations significantly improved association predictions for the phenotypes with no more specific known associations by a large margin. Examples are also shown to demonstrate how phenotype paths in phenotype ontology and transfer learning with gene ontology can improve the predictions. Availability and Implementation: Source code is available at http://compbio.cs.umn.edu/ontophenome. Contact: kuang@cs.umn.com Supplementary information: Supplementary data are available at Bioinformatics online.

[1]  E. Snitkin,et al.  Genome-wide prioritization of disease genes and identification of disease-disease associations from an integrated human functional linkage network , 2009, Genome Biology.

[2]  Chao Dai,et al.  An integrative modular approach to systematically predict gene-phenotype associations , 2010, BMC Bioinform..

[3]  Hui Xiong,et al.  Transfer learning from multiple source domains via consensus regularization , 2008, CIKM '08.

[4]  Karin M. Verspoor,et al.  PHENOstruct: Prediction of human phenotype ontology terms using heterogeneous data sources , 2015, F1000Research.

[5]  Jagdish Chandra Patra,et al.  Genome-wide inferring gene-phenotype relationship by walking on the heterogeneous network , 2010, Bioinform..

[6]  Jianmin Wang,et al.  Dual Transfer Learning , 2012, SDM.

[7]  Hanno Steen,et al.  Development of human protein reference database as an initial platform for approaching systems biology in humans. , 2003, Genome research.

[8]  David Edwards,et al.  Plant bioinformatics: from genome to phenome. , 2004, Trends in biotechnology.

[9]  C. Mungall,et al.  Gene Ontology Consortium : going forward The Gene Ontology , 2015 .

[10]  TaeHyun Hwang,et al.  A Heterogeneous Label Propagation Algorithm for Disease Gene Discovery , 2010, SDM.

[11]  C. R. Scriver,et al.  After the genome—the phenome? , 2004, Journal of Inherited Metabolic Disease.

[12]  Maoqiang Xie,et al.  Prioritizing Disease Genes by Bi-Random Walk , 2012, PAKDD.

[13]  P. Robinson,et al.  Walking the interactome for prioritization of candidate disease genes. , 2008, American journal of human genetics.

[14]  Michael Q. Zhang,et al.  Network-based global inference of human disease genes , 2008, Molecular systems biology.

[15]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[16]  C. Sabatti,et al.  The Human Phenome Project , 2003, Nature Genetics.

[17]  C. Wijmenga,et al.  Reconstruction of a functional human gene network, with an application for prioritizing positional candidate genes. , 2006, American journal of human genetics.

[18]  Roded Sharan,et al.  Associating Genes and Protein Complexes with Disease via Network Propagation , 2010, PLoS Comput. Biol..

[19]  R. Kuang,et al.  Network-based Phenome-Genome Association Prediction by Bi-Random Walk , 2015, PloS one.

[20]  TaeHyun Hwang,et al.  Inferring disease and gene set associations with rank coherence in networks , 2011, Bioinform..

[21]  Pall I. Olason,et al.  A human phenome-interactome network of protein complexes implicated in genetic disorders , 2007, Nature Biotechnology.

[22]  Xiaoli Li,et al.  Inferring Gene-Phenotype Associations via Global Protein Complex Network Propagation , 2011, PloS one.

[23]  Rui Kuang,et al.  Transfer Learning across Cancers on DNA Copy Number Variation Analysis , 2013, 2013 IEEE 13th International Conference on Data Mining.

[24]  Tao Jiang,et al.  Uncover disease genes by maximizing information flow in the phenome–interactome network , 2011, Bioinform..

[25]  Damian Smedley,et al.  The Human Phenotype Ontology project: linking molecular biology and disease through phenotype data , 2014, Nucleic Acids Res..