Summarisation of weighted networks

Networks often contain implicit structure. We introduce novel problems and methods that look for structure in networks, by grouping nodes into supernodes and edges to superedges, and then make this structure visible to the user in a smaller generalised network. This task of finding generalisations of nodes and edges is formulated as ‘network Summarisation’. We propose models and algorithms for networks that have weights on edges, on nodes or on both, and study three new variants of the network summarisation problem. In edge-based weighted network summarisation, the summarised network should preserve edge weights as well as possible. A wider class of settings is considered in path-based weighted network summarisation, where the resulting summarised network should preserve longer range connectivities between nodes. Node-based weighted network summarisation in turn allows weights also on nodes and summarisation aims to preserve more information related to high weight nodes. We study theoretical properties of these problems and show them to be NP-hard. We propose a range of heuristic generalisation algorithms with different trade-offs between complexity and quality of the result. Comprehensive experiments on real data show that weighted networks can be summarised efficiently with relatively little error.

[1]  Gerard Salton,et al.  A vector space model for automatic indexing , 1975, CACM.

[2]  Uriel G. Rothblum,et al.  Technical Note—A Partitioning Problem with Additive Objective with an Application to Optimal Inventory Groupings for Joint Replenishment , 1982 .

[3]  Huan Liu,et al.  Discovering Overlapping Groups in Social Media , 2010, 2010 IEEE International Conference on Data Mining.

[4]  L. Harrison,et al.  Incorporation of Real-Time PCR into Routine Public Health Surveillance of Culture Negative Bacterial Meningitis in São Paulo, Brazil , 2011, PloS one.

[5]  Hannu Toivonen,et al.  Document summarization based on word associations , 2014, SIGIR.

[6]  Nisheeth Shrivastava,et al.  Graph summarization with bounded error , 2008, SIGMOD Conference.

[7]  Balaraman Ravindran,et al.  Epsilon Equitable Partition: A positional analysis method for large social networks , 2009, COMAD.

[8]  Richard M. Karp,et al.  Reducibility among combinatorial problems" in complexity of computer computations , 1972 .

[9]  Luciano da Fontoura Costa,et al.  Extractive summarization using complex networks and syntactic dependency , 2012 .

[10]  Hannu Toivonen,et al.  Finding reliable subgraphs from large probabilistic graphs , 2008, Data Mining and Knowledge Discovery.

[11]  Jean-Marc Vincent,et al.  A Generic Algorithmic Framework to Solve Special Versions of the Set Partitioning Problem , 2014, 2014 IEEE 26th International Conference on Tools with Artificial Intelligence.

[12]  Jiawei Han,et al.  Mining Graph Patterns Efficiently via Randomized Summaries , 2009, Proc. VLDB Endow..

[13]  S. Borgatti,et al.  Regular blockmodels of multiway, multimode matrices☆ , 1992 .

[14]  Hiroshi Mamitsuka,et al.  Mining from protein–protein interactions , 2012, WIREs Data Mining Knowl. Discov..

[15]  Matheus Palhares Viana,et al.  On time-varying collaboration networks , 2013, J. Informetrics.

[16]  Christian Böhm,et al.  Summarization-based mining bipartite graphs , 2012, KDD.

[17]  Jiawei Han,et al.  Parallel PathFinder Algorithms for Mining Structures from Graphs , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[18]  Arindam Banerjee,et al.  Bayesian Co-clustering , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[19]  Philip S. Yu,et al.  Magnet community identification on social networks , 2012, KDD.

[20]  Ulrich Elsner,et al.  Graph partitioning - a survey , 2005 .

[21]  Michael C. Schatz,et al.  Revealing Biological Modules via Graph Summarization , 2009, J. Comput. Biol..

[22]  H. White,et al.  “Structural Equivalence of Individuals in Social Networks” , 2022, The SAGE Encyclopedia of Research Design.

[23]  Raymie Stata,et al.  The Link Database: fast access to graphs of the Web , 2002, Proceedings DCC 2002. Data Compression Conference.

[24]  Inderjit S. Dhillon,et al.  Information-theoretic co-clustering , 2003, KDD '03.

[25]  Jignesh M. Patel,et al.  Efficient aggregation for graph summarization , 2008, SIGMOD Conference.

[26]  Motonori Ota,et al.  Multiple-Localization and Hub Proteins , 2016, PloS one.

[27]  Philip S. Yu,et al.  Mining top-K large structural patterns in a massive network , 2011, Proc. VLDB Endow..

[28]  Hannu Toivonen,et al.  The Use of Weighted Graphs for Large-Scale Genome Analysis , 2014, PloS one.

[29]  Philip S. Yu,et al.  Efficient Topological OLAP on Information Networks , 2011, DASFAA.

[30]  Stefanos Gritzalis,et al.  Privacy Preservation by k-Anonymization of Weighted Social Networks , 2012, 2012 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining.

[31]  Donald F. Towsley,et al.  Resisting structural re-identification in anonymized social networks , 2010, The VLDB Journal.

[32]  Arlindo L. Oliveira,et al.  Biclustering algorithms for biological data analysis: a survey , 2004, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[33]  Sebastiano Vigna,et al.  The webgraph framework I: compression techniques , 2004, WWW '04.

[34]  Dragomir R. Radev,et al.  A survey of graphs in natural language processing* , 2015, Natural Language Engineering.

[35]  Evimaria Terzi,et al.  GraSS: Graph Structure Summarization , 2010, SDM.

[36]  Fang Zhou,et al.  Compression of weighted graphs , 2011, KDD.

[37]  Luca Cagliero,et al.  GraphSum: Discovering correlations among multiple terms for graph-based summarization , 2013, Inf. Sci..

[38]  Micah Adler,et al.  Towards compressing Web graphs , 2001, Proceedings DCC 2001. Data Compression Conference.

[39]  Qiang Qu,et al.  A direct mining approach to efficient constrained graph pattern discovery , 2013, SIGMOD '13.

[40]  Mark Newman,et al.  Networks: An Introduction , 2010 .

[41]  Hannu Toivonen,et al.  Biomine: predicting links between biological entities using network models of heterogeneous databases , 2012, BMC Bioinformatics.

[42]  Alberto Apostolico,et al.  Graph Compression by BFS , 2009, Algorithms.

[43]  Alessandro Vespignani,et al.  Modeling Users' Activity on Twitter Networks: Validation of Dunbar's Number , 2011, PloS one.

[44]  Jennifer Widom,et al.  SimRank: a measure of structural-context similarity , 2002, KDD.

[45]  Danai Koutra,et al.  VOG: Summarizing and Understanding Large Graphs , 2014, SDM.

[46]  MamitsukaHiroshi Mining from protein–protein interactions , 2012 .

[47]  Xin Wang,et al.  Query preserving graph compression , 2012, SIGMOD Conference.

[48]  Anna C. Gilbert,et al.  Compressing Network Graphs , 2004 .

[49]  Francisco Herrera,et al.  h-Index: A review focused in its variants, computation and standardization for different scientific fields , 2009, J. Informetrics.

[50]  Richard M. Karp,et al.  Reducibility Among Combinatorial Problems , 1972, 50 Years of Integer Programming.

[51]  Roded Sharan,et al.  Biclustering Algorithms: A Survey , 2007 .

[52]  Christian Böhm,et al.  Weighted Graph Compression for Parameter-free Clustering With PaCCo , 2011, SDM.

[53]  Jiawei Han,et al.  Hierarchical Web-Page Clustering via In-Page and Cross-Page Link Structures , 2010, PAKDD.

[54]  Fang Zhou,et al.  A Framework for Path-Oriented Network Simplification , 2010, IDA.

[55]  Philip S. Yu,et al.  Graph OLAP: Towards Online Analytical Processing on Graphs , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[56]  Masahiro Kimura,et al.  Extracting influential nodes on a social network for information diffusion , 2009, Data Mining and Knowledge Discovery.

[57]  M. Newman,et al.  Vertex similarity in networks. , 2005, Physical review. E, Statistical, nonlinear, and soft matter physics.

[58]  Sepandar D. Kamvar,et al.  An Analytical Comparison of Approaches to Personalizing PageRank , 2003 .