Network Simplification with Minimal Loss of Connectivity

We propose a novel problem to simplify weighted graphs by pruning least important edges from them. Simplified graphs can be used to improve visualization of a network, to extract its main structure, or as a pre-processing step for other data mining algorithms. We define a graph connectivity function based on the best paths between all pairs of nodes. Given the number of edges to be pruned, the problem is then to select a subset of edges that best maintains the overall graph connectivity. Our model is applicable to a wide range of settings, including probabilistic graphs, flow graphs and distance graphs, since the path quality function that is used to find best paths can be defined by the user. We analyze the problem, and give lower bounds for the effect of individual edge removal in the case where the path quality function has a natural recursive property. We then propose a range of algorithms and report on experimental results on real networks derived from public biological databases. The results show that a large fraction of edges can be removed quite fast and with minimal effect on the overall graph connectivity. A rough semantic analysis of the removed edges indicates that few important edges were removed, and that the proposed approach could be a valuable tool in aiding users to view or explore weighted graphs.

[1]  J. Kruskal On the shortest spanning subtree of a graph and the traveling salesman problem , 1956 .

[2]  Danny Ziyi Chen,et al.  Efficient Algorithms for Simplifying Flow Networks , 2005, COCOON.

[3]  Hannu Toivonen,et al.  Finding reliable subgraphs from large probabilistic graphs , 2008, Data Mining and Knowledge Discovery.

[4]  Vitaly Osipov,et al.  The Filter-Kruskal Minimum Spanning Tree Algorithm , 2009, ALENEX.

[5]  M E J Newman,et al.  Community structure in social and biological networks , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[6]  Francis T. Durso,et al.  Network Structures in Proximity Data , 1989 .

[7]  Jiawei Han,et al.  Parallel PathFinder Algorithms for Mining Structures from Graphs , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[8]  Oscar Cordón,et al.  A new variant of the Pathfinder algorithm to generate large visual science maps in cubic time , 2008, Inf. Process. Manag..

[9]  Hannu Toivonen,et al.  Link Discovery in Graphs Derived from Biological Databases , 2006, DILS.

[10]  Z W Birnbaum,et al.  ON THE IMPORTANCE OF DIFFERENT COMPONENTS IN A MULTICOMPONENT SYSTEM , 1968 .

[11]  Therese C. Biedl,et al.  Simplifying Flow Networks , 2000, MFCS.

[12]  Godfried T. Toussaint,et al.  The relative neighbourhood graph of a finite planar set , 1980, Pattern Recognit..

[13]  M. Stoer Design of Survivable Networks , 1993 .

[14]  Christos Faloutsos,et al.  Fast discovery of connection subgraphs , 2004, KDD.

[15]  Fang Zhou,et al.  A Framework for Path-Oriented Network Simplification , 2010, IDA.