Random Walks on Hypergraphs with Applications to Disease-Gene Prioritization

Typically, gene interaction networks are expressed as graphs, storing pairwise interactions between genes. Because of the vast amount of literature on statistical graph inference, this is a useful representation in practice. However, such a pairwise representation ignores more complex features of gene interactions, such as gene regulation and assembly. In this thesis, we propose a hypergraph model for gene interaction networks. Since many network-based algorithms rely on random walks, we analyze our model through the viewpoint of random walks. We outline a general framework for random walks on a hypergraph, and we show in a precise sense that random walks on hypergraphs are not special cases of random walks on graphs. We also de€ne a mapping between hypergraphs and graphs, and use that mapping to build a hypergraph from an already-existing graph representation of a gene interaction network. We then use this hypergraph network to perform disease-gene prioritization via the PageRank algorithm. For monogenic diseases, we €nd that the hypergraph noticeably outperforms the graph, demonstrating the value of using hypergraphs as gene interaction networks.

[1]  Ian M. Donaldson,et al.  iRefIndex: A consolidated protein interaction database with provenance , 2008, BMC Bioinformatics.

[2]  P. Robinson,et al.  Walking the interactome for prioritization of candidate disease genes. , 2008, American journal of human genetics.

[3]  Pavel Berkhin,et al.  Bookmark-Coloring Algorithm for Personalized PageRank Computing , 2006, Internet Math..

[4]  Benjamin J. Raphael,et al.  Pan-Cancer Network Analysis Identifies Combinations of Rare Somatic Mutations across Pathways and Protein Complexes , 2014, Nature Genetics.

[5]  Haiyuan Yu,et al.  HINT: High-quality protein interactomes and their applications in understanding human disease , 2012, BMC Systems Biology.

[6]  L. Stein,et al.  A human functional protein interaction network and its application to cancer data analysis , 2010, Genome Biology.

[7]  Rolf Niedermeier,et al.  Data reduction and exact algorithms for clique cover , 2009, JEAL.

[8]  Henning Hermjakob,et al.  The Reactome pathway Knowledgebase , 2015, Nucleic acids research.

[9]  Daniel M. Ennis,et al.  Assignment-minimum clique coverings , 2012, JEAL.

[10]  Thierson Couto,et al.  Modeling the web as a hypergraph to compute page reputation , 2010, Inf. Syst..

[11]  Bernhard Schölkopf,et al.  Learning with Hypergraphs: Clustering, Classification, and Embedding , 2006, NIPS.

[12]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[13]  Albert-László Barabási,et al.  Statistical mechanics of complex networks , 2001, ArXiv.

[14]  Ozlem Keskin,et al.  Human Cancer Protein-Protein Interaction Network: A Structural Perspective , 2009, PLoS Comput. Biol..

[15]  Michal Pilipczuk,et al.  Known Algorithms for Edge Clique Cover are Probably Optimal , 2012, SIAM J. Comput..

[16]  T M Murali,et al.  Signaling hypergraphs. , 2014, Trends in biotechnology.

[17]  Jieping Ye,et al.  Hypergraph spectral learning for multi-label classification , 2008, KDD.

[18]  TaeHyun Hwang,et al.  A hypergraph-based learning algorithm for classifying gene expression and arrayCGH data with prior knowledge , 2009, Bioinform..

[19]  Alain Bretto,et al.  Random walks in directed hypergraphs and application to semi-supervised image segmentation , 2014, Comput. Vis. Image Underst..

[20]  Alan M. Frieze,et al.  The cover times of random walks on random uniform hypergraphs , 2013, Theor. Comput. Sci..

[21]  J. Orlin Contentment in graph theory: Covering graphs with cliques , 1977 .

[22]  Stijn van Dongen,et al.  Graph Clustering Via a Discrete Uncoupling Process , 2008, SIAM J. Matrix Anal. Appl..