On Embedding Uncertain Graphs

Graph data are prevalent in communication networks, social media, and biological networks. These data, which are often noisy or inexact, can be represented by uncertain graphs, whose edges are associated with probabilities to indicate the chances that they exist. Recently, researchers have studied various algorithms (e.g., clustering, classification, and k-NN) for uncertain graphs. These solutions face two problems: (1) high dimensionality: uncertain graphs are often highly complex, which can affect the mining quality; and (2) low reusability, where an existing mining algorithm has to be redesigned to deal with uncertain graphs. To tackle these problems, we propose a solution called URGE, or UnceRtain Graph Embedding. Given an uncertain graph G, URGE generates G's embedding, or a set of low-dimensional vectors, which carry the proximity information of nodes in G. This embedding enables the dimensionality of G to be reduced, without destroying node proximity information. Due to its simplicity, existing mining solutions can be used on the embedding. We investigate two low- and high-order node proximity measures in the embedding generation process, and develop novel algorithms to enable fast evaluation. To our best knowledge, there is no prior study on the use of embedding for uncertain graphs. We have further performed extensive experiments for clustering, classification, and k-NN on several uncertain graph datasets. Our results show that URGE attains better effectiveness than current uncertain data mining algorithms, as well as state-of-the-art embedding solutions. The embedding and mining performance is also highly efficient in our experiments.

[1]  Jianzhong Li,et al.  SimRank computation on uncertain graphs , 2016, 2016 IEEE 32nd International Conference on Data Engineering (ICDE).

[2]  Jianzhong Li,et al.  Structural-Context Similarities for Uncertain Graphs , 2013, 2013 IEEE 13th International Conference on Data Mining.

[3]  Huan Liu,et al.  Relational learning via latent social dimensions , 2009, KDD.

[4]  Dmitrij Frishman,et al.  MIPS: analysis and annotation of proteins from whole genomes in 2005 , 2005, Nucleic Acids Res..

[5]  Zhiyuan Liu,et al.  Max-Margin DeepWalk: Discriminative Learning of Network Representation , 2016, IJCAI.

[6]  Jiawei Han,et al.  gIceberg: Towards iceberg analysis in large graphs , 2013, 2013 IEEE 29th International Conference on Data Engineering (ICDE).

[7]  L. R. Dice Measures of the Amount of Ecologic Association Between Species , 1945 .

[8]  Charu C. Aggarwal,et al.  Reliable clustering on uncertain graphs , 2012, 2012 IEEE 12th International Conference on Data Mining.

[9]  Robert Tappan Morris,et al.  ExOR: opportunistic multi-hop routing for wireless networks , 2005, SIGCOMM '05.

[10]  Lise Getoor,et al.  Collective Classification in Network Data , 2008, AI Mag..

[11]  P. Jaccard,et al.  Etude comparative de la distribution florale dans une portion des Alpes et des Jura , 1901 .

[12]  Mikhail Belkin,et al.  Laplacian Eigenmaps and Spectral Techniques for Embedding and Clustering , 2001, NIPS.

[13]  Charu C. Aggarwal,et al.  Node classification in uncertain graphs , 2014, SSDBM '14.

[14]  Lior Rokach,et al.  Clustering Methods , 2005, The Data Mining and Knowledge Discovery Handbook.

[15]  Yizhou Sun,et al.  Graph Regularized Transductive Classification on Heterogeneous Information Networks , 2010, ECML/PKDD.

[16]  Tamir Tassa,et al.  Injecting Uncertainty in Graphs for Identity Obfuscation , 2012, Proc. VLDB Endow..

[17]  Jianzhong Li,et al.  Mining Frequent Subgraph Patterns from Uncertain Graph Data , 2010, IEEE Transactions on Knowledge and Data Engineering.

[18]  Laks V. S. Lakshmanan,et al.  Truss Decomposition of Probabilistic Graphs: Semantics and Algorithms , 2016, SIGMOD Conference.

[19]  Lior Rokach,et al.  Data Mining And Knowledge Discovery Handbook , 2005 .

[20]  M. Kendall,et al.  Kendall's advanced theory of statistics , 1995 .

[21]  Francesco Bonchi,et al.  Core decomposition of uncertain graphs , 2014, KDD.

[22]  Anastasios Gounaris,et al.  Mining Uncertain Graphs: An Overview , 2016, ALGOCLOUD.

[23]  Jian Pei,et al.  Asymmetric Transitivity Preserving Graph Embedding , 2016, KDD.

[24]  ZhangShuo,et al.  Mining Frequent Subgraph Patterns from Uncertain Graph Data , 2010 .

[25]  Jure Leskovec,et al.  node2vec: Scalable Feature Learning for Networks , 2016, KDD.

[26]  Sean R. Collins,et al.  Toward a Comprehensive Atlas of the Physical Interactome of Saccharomyces cerevisiae*S , 2007, Molecular & Cellular Proteomics.

[27]  Jianzhong Li,et al.  Discovering frequent subgraphs over uncertain graph databases under probabilistic semantics , 2010, KDD.

[28]  Graham Cormode,et al.  Node Classification in Social Networks , 2011, Social Network Data Analytics.

[29]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[30]  Srikanta Tirthapura,et al.  Mining maximal cliques from an uncertain graph , 2013, 2015 IEEE 31st International Conference on Data Engineering.

[31]  Stijn van Dongen,et al.  Graph Clustering Via a Discrete Uncoupling Process , 2008, SIAM J. Matrix Anal. Appl..

[32]  Alexander J. Smola,et al.  Distributed large-scale natural graph factorization , 2013, WWW.

[33]  W. Hoeffding Probability Inequalities for sums of Bounded Random Variables , 1963 .

[34]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[35]  Hong Chen,et al.  Probabilistic SimRank computation over uncertain graphs , 2015, Inf. Sci..

[36]  Deli Zhao,et al.  Network Representation Learning with Rich Text Information , 2015, IJCAI.

[37]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[38]  George Kollios,et al.  Clustering Large Probabilistic Graphs , 2013, IEEE Transactions on Knowledge and Data Engineering.

[39]  Christos Faloutsos,et al.  Automatic multimedia cross-modal correlation discovery , 2004, KDD.

[40]  Steven Skiena,et al.  DeepWalk: online learning of social representations , 2014, KDD.

[41]  George Kollios,et al.  k-nearest neighbors in uncertain graphs , 2010, Proc. VLDB Endow..

[42]  Charu C. Aggarwal,et al.  Discovering highly reliable subgraphs in uncertain graphs , 2011, KDD.

[43]  Sean R. Collins,et al.  Global landscape of protein complexes in the yeast Saccharomyces cerevisiae , 2006, Nature.

[44]  P. Bork,et al.  Proteome survey reveals modularity of the yeast cell machinery , 2006, Nature.

[45]  Mingzhe Wang,et al.  LINE: Large-scale Information Network Embedding , 2015, WWW.

[46]  Dániel Fogaras,et al.  Towards Scaling Fully Personalized PageRank: Algorithms, Lower Bounds, and Experiments , 2005, Internet Math..

[47]  Wenwu Zhu,et al.  Structural Deep Network Embedding , 2016, KDD.

[48]  Éva Tardos,et al.  Maximizing the Spread of Influence through a Social Network , 2015, Theory Comput..