Review of network abstraction techniques

Networks are a common way of representing linked information. The goal of network abstraction is to transform a large network into a smaller one, so that the smaller is a useful summary of the original graph. In this paper we review different approaches and techniques proposed to abstract a large network. We classify the approaches along two axes. The first one consists of elementary simplification techniques used: pruning of (irrelevant) nodes and edges, partitioning to several smaller networks, and generalization by replacement of subnetworks by more general structures. The other axis is objective vs. subjective methods; the latter ones aim to maintain more information about those parts of a network that the user has indicated as interesting. We conclude the review by a brief analysis of which intersections of the two axes are least researched and could therefore have future potential.

[1]  S. Vavasis,et al.  Geometric Separators for Finite-Element Meshes , 1998, SIAM J. Sci. Comput..

[2]  R. Diekmann,et al.  Using helpful sets to improve graph bisections , 1994, Interconnection Networks and Mapping and Scheduling Parallel Computations.

[3]  Horst Bunke,et al.  A graph distance metric based on the maximal common subgraph , 1998, Pattern Recognit. Lett..

[4]  Claire Cardie,et al.  Constrained K-means Clustering with Background Knowledge , 2001, ICML.

[5]  Bruce Hendrickson,et al.  An Improved Spectral Graph Partitioning Algorithm for Mapping Parallel Computations , 1995, SIAM J. Sci. Comput..

[6]  Ambuj K. Singh,et al.  Closure-Tree: An Index Structure for Graph Queries , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[7]  Padhraic Smyth,et al.  Algorithms for estimating relative importance in networks , 2003, KDD '03.

[8]  Shahid H. Bokhari,et al.  A Partitioning Strategy for Nonuniform Problems on Multiprocessors , 1987, IEEE Transactions on Computers.

[9]  Philip S. Yu,et al.  Substructure similarity search in graph databases , 2005, SIGMOD '05.

[10]  Jignesh M. Patel,et al.  SAGA: a subgraph matching tool for biological graphs , 2007, Bioinform..

[11]  Claudio Castellano,et al.  Defining and identifying communities in networks. , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[12]  Luc De Raedt,et al.  ProbLog: A Probabilistic Prolog and its Application in Link Discovery , 2007, IJCAI.

[13]  Jiawei Han,et al.  CloseGraph: mining closed frequent graph patterns , 2003, KDD '03.

[14]  W. Relative Neighborhood Graphs and Their Relatives , 2004 .

[15]  Amit P. Sheth,et al.  Discovering informative connection subgraphs in multi-relational graphs , 2005, SKDD.

[16]  Fang Wu,et al.  Finding communities in linear time: a physics approach , 2003, ArXiv.

[17]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[18]  Noah E. Friedkin,et al.  Theoretical Foundations for Centrality Measures , 1991, American Journal of Sociology.

[19]  George Karypis,et al.  Finding Frequent Patterns in a Large Sparse Graph* , 2005, Data Mining and Knowledge Discovery.

[20]  P. Bonacich Factoring and weighting approaches to status scores and clique identification , 1972 .

[21]  Gert Sabidussi,et al.  The centrality index of a graph , 1966 .

[22]  Hannu Toivonen,et al.  Finding Frequent Substructures in Chemical Compounds , 1998, KDD.

[23]  Leonard M. Freeman,et al.  A set of measures of centrality based upon betweenness , 1977 .

[24]  M. Zelen,et al.  Rethinking centrality: Methods and examples☆ , 1989 .

[25]  Vipin Kumar,et al.  A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs , 1998, SIAM J. Sci. Comput..

[26]  Jiong Yang,et al.  SPIN: mining maximal frequent subgraphs from graph databases , 2004, KDD.

[27]  Christos Faloutsos,et al.  Fast discovery of connection subgraphs , 2004, KDD.

[28]  George Karypis,et al.  Frequent subgraph discovery , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[29]  Gabriel Valiente,et al.  A graph distance metric combining maximum common subgraph and minimum common supergraph , 2001, Pattern Recognit. Lett..

[30]  Hiroshi Motoda,et al.  CLIP: Concept Learning from Inference Patterns , 1995, Artif. Intell..

[31]  Julian R. Ullmann,et al.  An Algorithm for Subgraph Isomorphism , 1976, J. ACM.

[32]  M. Newman,et al.  Finding community structure in very large networks. , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.

[33]  Alex Pothen,et al.  PARTITIONING SPARSE MATRICES WITH EIGENVECTORS OF GRAPHS* , 1990 .

[34]  Lawrence B. Holder,et al.  Substructure Discovery Using Minimum Description Length and Background Knowledge , 1993, J. Artif. Intell. Res..

[35]  Stanley Wasserman,et al.  Social Network Analysis: Methods and Applications , 1994, Structural analysis in the social sciences.

[36]  Godfried T. Toussaint,et al.  The relative neighbourhood graph of a finite planar set , 1980, Pattern Recognit..

[37]  Philip S. Yu,et al.  Searching Substructures with Superimposed Distance , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[38]  L. Freeman Centrality in social networks conceptual clarification , 1978 .

[39]  Z W Birnbaum,et al.  ON THE IMPORTANCE OF DIFFERENT COMPONENTS IN A MULTICOMPONENT SYSTEM , 1968 .

[40]  Stephen P. Borgatti,et al.  Identifying sets of key players in a social network , 2006, Comput. Math. Organ. Theory.

[41]  Taher H. Haveliwala Topic-sensitive PageRank , 2002, IEEE Trans. Knowl. Data Eng..

[42]  M E J Newman,et al.  Finding and evaluating community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[43]  Martin Everett,et al.  Ego network betweenness , 2005, Soc. Networks.

[44]  Christian Borgelt,et al.  Subgraph Support in a Single Large Graph , 2007, Seventh IEEE International Conference on Data Mining Workshops (ICDMW 2007).

[45]  Lawrence B. Holder,et al.  Substucture Discovery in the SUBDUE System , 1994, KDD Workshop.

[46]  Dennis Shasha,et al.  Algorithmics and applications of tree and graph searching , 2002, PODS.

[47]  Jignesh M. Patel,et al.  TALE: A Tool for Approximate Large Graph Matching , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[48]  Dániel Fogaras,et al.  Towards Scaling Fully Personalized PageRank: Algorithms, Lower Bounds, and Experiments , 2005, Internet Math..

[49]  Mario Vento,et al.  A (sub)graph isomorphism algorithm for matching large graphs , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[50]  John Scott Social Network Analysis , 1988 .

[51]  Siegfried Nijssen,et al.  What Is Frequent in a Single Graph? , 2007, PAKDD.

[52]  Leo Katz,et al.  A new status index derived from sociometric analysis , 1953 .

[53]  Wei Wang,et al.  Graph Database Indexing Using Structured Graph Decomposition , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[54]  Mark Newman,et al.  Detecting community structure in networks , 2004 .

[55]  Shou-De Lin,et al.  Unsupervised link discovery in multi-relational data via rarity analysis , 2003, Third IEEE International Conference on Data Mining.

[56]  Wei Zhang,et al.  Improvement of HITS-based algorithms on web documents , 2002, WWW '02.

[57]  Hannu Toivonen,et al.  Link Discovery in Graphs Derived from Biological Databases , 2006, DILS.

[58]  Takashi Washio,et al.  An Apriori-Based Algorithm for Mining Frequent Substructures from Graph Data , 2000, PKDD.

[59]  Charles M. Fiduccia,et al.  A linear-time heuristic for improving network partitions , 1988, 25 years of DAC.

[60]  Philip S. Yu,et al.  Graph indexing: a frequent structure-based approach , 2004, SIGMOD '04.

[61]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[62]  M E J Newman,et al.  Community structure in social and biological networks , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[63]  Jiawei Han,et al.  gSpan: graph-based substructure pattern mining , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[64]  Bruce Hendrickson,et al.  A Multi-Level Algorithm For Partitioning Graphs , 1995, Proceedings of the IEEE/ACM SC95 Conference.

[65]  Luc De Raedt,et al.  Compressing probabilistic Prolog programs , 2007, Machine Learning.

[66]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.

[67]  M. Stoer Design of Survivable Networks , 1993 .

[68]  Jennifer Widom,et al.  Scaling personalized web search , 2003, WWW '03.

[69]  S.,et al.  An Efficient Heuristic Procedure for Partitioning Graphs , 2022 .

[70]  Hannu Toivonen,et al.  Finding reliable subgraphs from large probabilistic graphs , 2008, Data Mining and Knowledge Discovery.

[71]  Christos Faloutsos,et al.  Center-piece subgraphs: problem definition and fast solutions , 2006, KDD '06.

[72]  C. Lie,et al.  Joint reliability-importance of two edges in an undirected network , 1993 .

[73]  George Karypis,et al.  An efficient algorithm for discovering frequent subgraphs , 2004, IEEE Transactions on Knowledge and Data Engineering.

[74]  U. Brandes A faster algorithm for betweenness centrality , 2001 .

[75]  Christoph F. Eick,et al.  Supervised clustering - algorithms and benefits , 2004, 16th IEEE International Conference on Tools with Artificial Intelligence.

[76]  Ulrich Elsner,et al.  Graph partitioning - a survey , 2005 .

[77]  Christian Borgelt,et al.  Support Computation for Mining Frequent Subgraphs in a Single Graph , 2007, MLG.