Distance Preserving Graph Simplification

Large graphs are difficult to represent, visualize, and understand. In this paper, we introduce “gate graph” - a new approach to perform graph simplification. A gate graph provides a simplified topological view of the original graph. Specifically, we construct a gate graph from a large graph so that for any “non-local” vertex pair (distance greater than some threshold) in the original graph, their shortest-path distance can be recovered by consecutive “local” walks through the gate vertices in the gate graph. We perform a theoretical investigation on the gate-vertex set discovery problem. We characterize its computational complexity and reveal the upper bound of minimum gate vertex set using VC-dimension theory. We propose an efficient mining algorithm to discover a gate-vertex set with guaranteed logarithmic bound. The detailed experimental results using both real and synthetic graphs demonstrate the effectiveness and efficiency of our approach.

[1]  Christos Faloutsos,et al.  Fast discovery of connection subgraphs , 2004, KDD.

[2]  Joshua B. Tenenbaum,et al.  Global Versus Local Methods in Nonlinear Dimensionality Reduction , 2002, NIPS.

[3]  Fang Zhou,et al.  A Framework for Path-Oriented Network Simplification , 2010, IDA.

[4]  Arnold L. Rosenberg,et al.  Graph Separators, with Applications , 2001, Frontiers of Computer Science.

[5]  Rajeev Motwani,et al.  Clique partitions, graph compression and speeding-up algorithms , 1991, STOC '91.

[6]  Yang Xiang,et al.  3-HOP: a high-compression indexing scheme for reachability query , 2009, SIGMOD Conference.

[7]  Ronald J. Gutman,et al.  Reach-Based Routing: A New Approach to Shortest Path Algorithms Optimized for Road Networks , 2004, ALENEX/ANALC.

[8]  Hannu Toivonen,et al.  Finding reliable subgraphs from large probabilistic graphs , 2008, Data Mining and Knowledge Discovery.

[9]  Kumar Chellapilla,et al.  Speeding up algorithms on compressed web graphs , 2009, WSDM '09.

[10]  Sakti Pramanik,et al.  An Efficient Path Computation Model for Hierarchically Structured Topographical Road Maps , 2002, IEEE Trans. Knowl. Data Eng..

[11]  Andrew V. Goldberg,et al.  Computing the shortest path: A search meets graph theory , 2005, SODA '05.

[12]  Hannu Toivonen,et al.  Fast Discovery of Reliable k-terminal Subgraphs , 2010, PAKDD.

[13]  R. EDGEWORTH JOHNSTONE From Practice to Theory and Back , 1967, Nature.

[14]  Vasek Chvátal,et al.  A Greedy Heuristic for the Set-Covering Problem , 1979, Math. Oper. Res..

[15]  Fang Zhou,et al.  Network Simplification with Minimal Loss of Connectivity , 2010, 2010 IEEE International Conference on Data Mining.

[16]  Sreenivas Gollapudi,et al.  A sketch-based distance oracle for web-scale graphs , 2010, WSDM '10.

[17]  Srinivasan Parthasarathy,et al.  Local graph sparsification for scalable clustering , 2011, SIGMOD '11.

[18]  Gerhard Weikum,et al.  HOPI: An Efficient Connection Index for Complex XML Document Collections , 2004, EDBT.

[19]  Elke A. Rundensteiner,et al.  Hierarchical Encoded Path Views for Path Query Processing: An Optimal Model and Its Performance Evaluation , 1998, IEEE Trans. Knowl. Data Eng..

[20]  Edwin R. Hancock,et al.  Spectral Simplification of Graphs , 2004, ECCV.

[21]  Paul Francis,et al.  IDMaps: a global internet host distance estimation service , 2001, TNET.

[22]  Therese C. Biedl,et al.  Simplifying Flow Networks , 2000, MFCS.

[23]  Nikhil Srivastava,et al.  Graph sparsification by effective resistances , 2008, SIAM J. Comput..

[24]  Danny Ziyi Chen,et al.  Two flow network simplification algorithms , 2006, Inf. Process. Lett..

[25]  Jian Pei,et al.  On k-skip shortest paths , 2011, SIGMOD '11.

[26]  Peter Sanders,et al.  Highway Hierarchies Hasten Exact Shortest Path Queries , 2005, ESA.

[27]  David E. Breen,et al.  A Simplification Algorithm for Visualizing the Structure of Complex Graphs , 2008, 2008 12th International Conference Information Visualisation.

[28]  Christos Faloutsos,et al.  Center-piece subgraphs: problem definition and fast solutions , 2006, KDD '06.

[29]  Ivan Herman,et al.  Graph Visualization and Navigation in Information Visualization: A Survey , 2000, IEEE Trans. Vis. Comput. Graph..

[30]  Hui Zhang,et al.  Predicting Internet network distance with coordinates-based approaches , 2002, Proceedings.Twenty-First Annual Joint Conference of the IEEE Computer and Communications Societies.

[31]  David D. Jensen,et al.  Using structure indices for efficient approximation of network properties , 2006, KDD '06.

[32]  David Haussler,et al.  Epsilon-nets and simplex range queries , 1986, SCG '86.

[33]  Charu C. Aggarwal,et al.  A Survey of Clustering Algorithms for Graph Data , 2010, Managing and Mining Graph Data.

[34]  Stephen Curial,et al.  Effectively visualizing large networks through sampling , 2005, VIS 05. IEEE Visualization, 2005..

[35]  Vladimir Vapnik,et al.  Chervonenkis: On the uniform convergence of relative frequencies of events to their probabilities , 1971 .

[36]  Jon M. Kleinberg,et al.  Triangulation and Embedding Using Small Sets of Beacons , 2004, FOCS.

[37]  Jon M. Kleinberg,et al.  Triangulation and embedding using small sets of beacons , 2004, 45th Annual IEEE Symposium on Foundations of Computer Science.

[38]  Aristides Gionis,et al.  Fast shortest path distance estimation in large networks , 2009, CIKM.