A Framework for Path-Oriented Network Simplification

We propose a generic framework and methods for simplification of large networks. The methods can be used to improve the understandability of a given network, to complement user-centric analysis methods, or as a pre-processing step for computationally more complex methods. The approach is path-oriented: edges are pruned while keeping the original quality of best paths between all pairs of nodes (but not necessarily all best paths). The framework is applicable to different kinds of graphs (for instance flow networks and random graphs) and connections can be measured in different ways (for instance by the shortest path, maximum flow, or maximum probability). It has relative neighborhood graphs, spanning trees, and certain Pathfinder graphs as its special cases. We give four algorithmic variants and report on experiments with 60 real biological networks. The simplification methods are part of on-going projects for intelligent analysis of networked information.

[1]  Oscar Cordón,et al.  A new variant of the Pathfinder algorithm to generate large visual science maps in cubic time , 2008, Inf. Process. Manag..

[2]  Therese C. Biedl,et al.  Simplifying Flow Networks , 2000, MFCS.

[3]  Hannu Toivonen,et al.  Link Discovery in Graphs Derived from Biological Databases , 2006, DILS.

[4]  Tobias Kötter,et al.  Supporting Creativity: Towards Associative Discovery of New Insights , 2008, PAKDD.

[5]  Godfried T. Toussaint,et al.  The relative neighbourhood graph of a finite planar set , 1980, Pattern Recognit..

[6]  Donald B. Johnson,et al.  Efficient Algorithms for Shortest Paths in Sparse Networks , 1977, J. ACM.

[7]  Vitaly Osipov,et al.  The Filter-Kruskal Minimum Spanning Tree Algorithm , 2009, ALENEX.

[8]  Remco C. Veltkamp,et al.  The gamma-neighborhood Graph , 1992, Comput. Geom..

[9]  Richard M. Karp,et al.  Theoretical Improvements in Algorithmic Efficiency for Network Flow Problems , 1972, Combinatorial Optimization.

[10]  Hannu Toivonen,et al.  Finding reliable subgraphs from large probabilistic graphs , 2008, Data Mining and Knowledge Discovery.

[11]  Francis T. Durso,et al.  Network Structures in Proximity Data , 1989 .

[12]  J. Kruskal On the shortest spanning subtree of a graph and the traveling salesman problem , 1956 .

[13]  Danny Ziyi Chen,et al.  Two flow network simplification algorithms , 2006, Inf. Process. Lett..

[14]  Mogens Nielsen,et al.  Mathematical Foundations of Computer Science 2000 , 2001, Lecture Notes in Computer Science.

[15]  P. Robinson,et al.  Walking the interactome for prioritization of candidate disease genes. , 2008, American journal of human genetics.