Simplification of Networks by Edge Pruning

We propose a novel problem to simplify weighted graphs by pruning least important edges from them. Simplified graphs can be used to improve visualization of a network, to extract its main structure, or as a pre-processing step for other data mining algorithms. We define a graph connectivity function based on the best paths between all pairs of nodes. Given the number of edges to be pruned, the problem is then to select a subset of edges that best maintains the overall graph connectivity. Our model is applicable to a wide range of settings, including probabilistic graphs, flow graphs and distance graphs, since the path quality function that is used to find best paths can be defined by the user. We analyze the problem, and give lower bounds for the effect of individual edge removal in the case where the path quality function has a natural recursive property. We then propose a range of algorithms and report on experimental results on real networks derived from public biological databases. The results show that a large fraction of edges can be removed quite fast and with minimal effect on the overall graph connectivity. A rough semantic analysis of the removed edges indicates that few important edges were removed, and that the proposed approach could be a valuable tool in aiding users to view or explore weighted graphs.

[1]  Hannu Toivonen,et al.  Finding reliable subgraphs from large probabilistic graphs , 2008, Data Mining and Knowledge Discovery.

[2]  Christos Faloutsos,et al.  Fast discovery of connection subgraphs , 2004, KDD.

[3]  Jiawei Han,et al.  Parallel PathFinder Algorithms for Mining Structures from Graphs , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[4]  Fang Zhou,et al.  A Framework for Path-Oriented Network Simplification , 2010, IDA.

[5]  G. Bower The Psychology of Learning and Motivation , 2021, Psychology of Learning and Motivation.

[6]  Therese C. Biedl,et al.  Simplifying Flow Networks , 2000, MFCS.

[7]  M. Birkner,et al.  Blow-up of semilinear PDE's at the critical dimension. A probabilistic approach , 2002 .

[8]  Fang Zhou,et al.  Compression of weighted graphs , 2011, KDD.

[9]  J. Kruskal On the shortest spanning subtree of a graph and the traveling salesman problem , 1956 .

[10]  Danny Ziyi Chen,et al.  Efficient Algorithms for Simplifying Flow Networks , 2005, COCOON.

[11]  Godfried T. Toussaint,et al.  The relative neighbourhood graph of a finite planar set , 1980, Pattern Recognit..

[12]  Oscar Cordón,et al.  A new variant of the Pathfinder algorithm to generate large visual science maps in cubic time , 2008, Inf. Process. Manag..

[13]  Hannu Toivonen,et al.  Link Discovery in Graphs Derived from Biological Databases , 2006, DILS.

[14]  M. Stoer Design of Survivable Networks , 1993 .

[15]  Michael R. Berthold Bisociative Knowledge Discovery , 2011, IDA.

[16]  Fang Zhou,et al.  Review of BisoNet Abstraction Techniques , 2012, Bisociative Knowledge Discovery.

[17]  Z W Birnbaum,et al.  ON THE IMPORTANCE OF DIFFERENT COMPONENTS IN A MULTICOMPONENT SYSTEM , 1968 .

[18]  Fang Zhou,et al.  Network Simplification with Minimal Loss of Connectivity , 2010, 2010 IEEE International Conference on Data Mining.

[19]  Paul R. Cohen,et al.  Advances in Intelligent Data Analysis IX, 9th International Symposium, IDA 2010, Tucson, AZ, USA, May 19-21, 2010. Proceedings , 2010, IDA.

[20]  M E J Newman,et al.  Community structure in social and biological networks , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[21]  Vitaly Osipov,et al.  The Filter-Kruskal Minimum Spanning Tree Algorithm , 2009, ALENEX.

[22]  Mogens Nielsen,et al.  Mathematical Foundations of Computer Science 2000 , 2001, Lecture Notes in Computer Science.

[23]  George L. Nemhauser,et al.  Handbooks in operations research and management science , 1989 .

[24]  Tobias Kötter,et al.  Towards Creative Information Exploration Based on Koestler's Concept of Bisociation , 2012, Bisociative Knowledge Discovery.

[25]  M. Grötschel,et al.  Chapter 10 Design of survivable networks , 1995 .

[26]  Francis T. Durso,et al.  Network Structures in Proximity Data , 1989 .