Individual and Collective Graph Mining: Principles, Algorithms, and Applications

Abstract Graphs naturally represent information ranging from links between web pages, to communication in email networks, to connections between neurons in our brains. These graphs often span billions of nodes and interactions between them. Within this deluge of interconnected data, how can we find the most important structures and summarize them? How can we efficiently visualize them? How can we detect anomalies that indicate critical events, such as an attack on a computer system, disease formation in the human brain, or the fall of a company? This book presents scalable, principled discovery algorithms that combine globality with locality to make sense of one or more graphs. In addition to fast algorithmic methodologies, we also contribute graph-theoretical ideas and models, and real-world applications in two main areas: •Individual Graph Mining: We show how to interpretably summarize a single graph by identifying its important graph structures. We complement summarization with inference, which leverag...

[1]  Christos Faloutsos,et al.  Realistic, Mathematically Tractable Graph Generation and Evolution, Using Kronecker Multiplication , 2005, PKDD.

[2]  Ping Zhu,et al.  A study of graph spectra for comparing graphs and trees , 2008, Pattern Recognit..

[3]  Danai Koutra,et al.  TimeCrunch: Interpretable Dynamic Graph Summarization , 2015, KDD.

[4]  Jure Leskovec,et al.  Motifs in Temporal Networks , 2016, WSDM.

[5]  Qi He,et al.  Communication motifs: a tool to characterize social communications , 2010, CIKM.

[6]  Kurt Mehlhorn,et al.  Weisfeiler-Lehman Graph Kernels , 2011, J. Mach. Learn. Res..

[7]  Christos Faloutsos,et al.  Beyond Blocks: Hyperbolic Community Detection , 2014, ECML/PKDD.

[8]  Lawrence B. Holder,et al.  Subdue: compression-based frequent pattern discovery in graph data , 2005 .

[9]  Nicola Barbieri,et al.  Cascade-based community detection , 2013, WSDM.

[10]  Whitman Richards,et al.  Graph Comparison Using Fine Structure Analysis , 2010, 2010 IEEE Second International Conference on Social Computing.

[11]  Albert-László Barabási,et al.  Statistical mechanics of complex networks , 2001, ArXiv.

[12]  Jafar Adibi,et al.  The Enron Email Dataset Database Schema and Brief Statistical Report , 2004 .

[13]  Andrew B. Kahng,et al.  Spectral Partitioning with Multiple Eigenvectors , 1999, Discret. Appl. Math..

[14]  Fan Chung Graham,et al.  Local Graph Partitioning using PageRank Vectors , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[15]  Danai Koutra,et al.  Graph based anomaly detection and description: a survey , 2014, Data Mining and Knowledge Discovery.

[16]  Edwin R. Hancock,et al.  Graph matching and clustering using spectral partitions , 2006, Pattern Recognit..

[17]  Tina Eliassi-Rad,et al.  A Guide to Selecting a Network Similarity Method , 2014, SDM.

[18]  Fang Zhou,et al.  Compression of weighted graphs , 2011, KDD.

[19]  Andrei Z. Broder,et al.  Graph structure in the Web , 2000, Comput. Networks.

[20]  Kaspar Riesen,et al.  Approximate graph edit distance computation by means of bipartite graph matching , 2009, Image Vis. Comput..

[21]  Christos Faloutsos,et al.  On data mining, compression, and Kolmogorov complexity , 2007, Data Mining and Knowledge Discovery.

[22]  Gunnar W. Klau,et al.  A new graph-based method for pairwise global network alignment , 2009, BMC Bioinformatics.

[23]  Thomas Gärtner,et al.  Cyclic pattern kernels for predictive graph mining , 2004, KDD.

[24]  Jean-Philippe Vert,et al.  Graph kernels based on tree patterns for molecules , 2006, Machine Learning.

[25]  Inderjit S. Dhillon,et al.  Information-theoretic co-clustering , 2003, KDD '03.

[26]  Jignesh M. Patel,et al.  Discovery-driven graph summarization , 2010, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).

[27]  Nikos D. Sidiropoulos,et al.  From K-Means to Higher-Way Co-Clustering: Multilinear Decomposition With Sparse Latent Factors , 2013, IEEE Transactions on Signal Processing.

[28]  Christos Faloutsos,et al.  ZooBP: Belief Propagation for Heterogeneous Networks , 2017, Proc. VLDB Endow..

[29]  Philip S. Yu,et al.  GraphScope: parameter-free mining of large time-evolving graphs , 2007, KDD '07.

[30]  Diane J. Cook,et al.  Graph-based anomaly detection , 2003, KDD '03.

[31]  Christos Faloutsos,et al.  EigenSpokes: Surprising Patterns and Scalable Community Chipping in Large Graphs , 2009, 2009 IEEE International Conference on Data Mining Workshops.

[32]  Christos Faloutsos,et al.  Graph evolution: Densification and shrinking diameters , 2006, TKDD.

[33]  Tanya Y. Berger-Wolf,et al.  Sampling community structure , 2010, WWW '10.

[34]  Christos Faloutsos,et al.  Monitoring Network Evolution using MDL , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[35]  Philip S. Yu,et al.  Hierarchical, Parameter-Free Community Discovery , 2008, ECML/PKDD.

[36]  Christos Faloutsos,et al.  Interestingness-Driven Diffusion Process Summarization in Dynamic Networks , 2014, ECML/PKDD.

[37]  Edwin R. Hancock,et al.  Iterative Procrustes alignment with the EM algorithm , 2002, Image Vis. Comput..

[38]  Steven Skiena,et al.  DeepWalk: online learning of social representations , 2014, KDD.

[39]  Le Song,et al.  Dynamic mixed membership blockmodel for evolving networks , 2009, ICML '09.

[40]  Christos Faloutsos,et al.  Spotting Suspicious Link Behavior with fBox: An Adversarial Perspective , 2014, 2014 IEEE International Conference on Data Mining.

[41]  Stephen G. Kobourov,et al.  GraphAEL: Graph Animations with Evolving Layouts , 2003, GD.

[42]  Tobias Isenberg,et al.  Weighted graph comparison techniques for brain connectivity analysis , 2013, CHI.

[43]  Mark E. J. Newman,et al.  Stochastic blockmodels and community structure in networks , 2010, Physical review. E, Statistical, nonlinear, and soft matter physics.

[44]  Christos Faloutsos,et al.  oddball: Spotting Anomalies in Weighted Graphs , 2010, PAKDD.

[45]  Z. Wang,et al.  The structure and dynamics of multilayer networks , 2014, Physics Reports.

[46]  Leonard M. Freeman,et al.  A set of measures of centrality based upon betweenness , 1977 .

[47]  Michael Lässig,et al.  Local graph alignment and motif search in biological networks. , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[48]  Ramanathan V. Guha,et al.  Information diffusion through blogspace , 2004, SKDD.

[49]  Danai Koutra,et al.  Network similarity via multiple social theories , 2013, 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2013).

[50]  Taher H. Haveliwala Topic-Sensitive PageRank: A Context-Sensitive Ranking Algorithm for Web Search , 2003, IEEE Trans. Knowl. Data Eng..

[51]  Shirish Tatikonda,et al.  Locality Sensitive Outlier Detection: A ranking driven approach , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[52]  Divyakant Agrawal,et al.  Diffusion of Information in Social Networks: Is It All Local? , 2012, 2012 IEEE 12th International Conference on Data Mining.

[53]  Pauli Miettinen,et al.  Model order selection for boolean matrix factorization , 2011, KDD.

[54]  Jian Pei,et al.  On mining cross-graph quasi-cliques , 2005, KDD '05.

[55]  William W. Cohen,et al.  Semi-Supervised Classification of Network Data Using Very Few Labels , 2010, 2010 International Conference on Advances in Social Networks Analysis and Mining.

[56]  Satoru Kawai,et al.  An Algorithm for Drawing General Undirected Graphs , 1989, Inf. Process. Lett..

[57]  Maya R. Gupta,et al.  Similarity-based Classification: Concepts and Algorithms , 2009, J. Mach. Learn. Res..

[58]  Alexander K. Kelmans Comparison of graphs by their number of spanning trees , 1976, Discret. Math..

[59]  Jon M. Kleinberg,et al.  Subgraph frequencies: mapping the empirical and extremal geography of large graph collections , 2013, WWW.

[60]  Jari Saramäki,et al.  Temporal motifs reveal homophily, gender-specific patterns, and group talk in call sequences , 2013, Proceedings of the National Academy of Sciences.

[61]  Paul Van Dooren,et al.  A MEASURE OF SIMILARITY BETWEEN GRAPH VERTICES . WITH APPLICATIONS TO SYNONYM EXTRACTION AND WEB SEARCHING , 2002 .

[62]  Jimeng Sun,et al.  Social influence analysis in large-scale networks , 2009, KDD.

[63]  Jun Huan,et al.  GPM: A graph pattern matching kernel with diffusion for chemical compound classification , 2008, 2008 8th IEEE International Conference on BioInformatics and BioEngineering.

[64]  Keith Andrews,et al.  Visual Graph Comparison , 2009, 2009 13th International Conference Information Visualisation.

[65]  Qing Chen,et al.  Graph Stream Summarization: From Big Bang to Big Crunch , 2016, SIGMOD Conference.

[66]  Yehuda Koren,et al.  Measuring and extracting proximity in networks , 2006, KDD '06.

[67]  Yizhou Sun,et al.  Graph Regularized Transductive Classification on Heterogeneous Information Networks , 2010, ECML/PKDD.

[68]  Ben Shneiderman,et al.  Extreme visualization: squeezing a billion records into a million pixels , 2008, SIGMOD Conference.

[69]  Ramanathan V. Guha,et al.  Propagation of trust and distrust , 2004, WWW '04.

[70]  Christos Faloutsos,et al.  Fast Random Walk with Restart and Its Applications , 2006, Sixth International Conference on Data Mining (ICDM'06).

[71]  Christos Faloutsos,et al.  Beyond 'Caveman Communities': Hubs and Spokes for Graph Compression and Mining , 2011, 2011 IEEE 11th International Conference on Data Mining.

[72]  Edwin R. Hancock,et al.  Measuring Graph Similarity Using Spectral Geometry , 2008, ICIAR.

[73]  Krishna P. Gummadi,et al.  On the evolution of user interaction in Facebook , 2009, WOSN '09.

[74]  Jon M. Kleinberg,et al.  The Web as a Graph: Measurements, Models, and Methods , 1999, COCOON.

[75]  Salih O. Duffuaa,et al.  A Linear Programming Approach for the Weighted Graph Matching Problem , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[76]  Lei Shi,et al.  VEGAS: Visual influEnce GrAph Summarization on Citation Networks , 2015, IEEE Transactions on Knowledge and Data Engineering.

[77]  Thomas Gärtner,et al.  On Graph Kernels: Hardness Results and Efficient Alternatives , 2003, COLT.

[78]  Hans-Jörg Schulz,et al.  Honeycomb: Visual Analysis of Large Scale Social Networks , 2009, INTERACT.

[79]  Daniel J. Abadi,et al.  Scalable Pattern Matching over Compressed Graphs via Dedensification , 2016, KDD.

[80]  Pauli Miettinen,et al.  MDL4BMF: Minimum Description Length for Boolean Matrix Factorization , 2014, TKDD.

[81]  Jian Pei,et al.  More is Simpler: Effectively and Efficiently Assessing Node-Pair Similarities Based on Hyperlinks , 2013, Proc. VLDB Endow..

[82]  Danai Koutra,et al.  OPAvion: mining and visualization in large graphs , 2012, SIGMOD Conference.

[83]  Silvio Lattanzi,et al.  On compressing social networks , 2009, KDD.

[84]  Jignesh M. Patel,et al.  Efficient aggregation for graph summarization , 2008, SIGMOD Conference.

[85]  Pierre Dragicevic,et al.  Interactive graph matching and visual comparison of graphs and clustered graphs , 2012, AVI.

[86]  Yizhou Sun,et al.  Fast computation of SimRank for static and dynamic information networks , 2010, EDBT '10.

[87]  Abdulmotaleb El-Saddik,et al.  Personalized PageRank vectors for tag recommendations: inside FolkRank , 2011, RecSys '11.

[88]  Jure Leskovec,et al.  Inferring networks of diffusion and influence , 2010, KDD.

[89]  Yizhou Sun,et al.  A Graph-Based Consensus Maximization Approach for Combining Multiple Supervised and Unsupervised Models , 2013, IEEE Transactions on Knowledge and Data Engineering.

[90]  Brendan J. Frey,et al.  Factor graphs and the sum-product algorithm , 2001, IEEE Trans. Inf. Theory.

[91]  Christos Faloutsos,et al.  SlashBurn: Graph Compression and Mining beyond Caveman Communities , 2014, IEEE Transactions on Knowledge and Data Engineering.

[92]  Christos Faloutsos,et al.  GCap: Graph-based Automatic Image Captioning , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[93]  Ben Shneiderman,et al.  Motif simplification: improving network visualization readability with fan, connector, and clique glyphs , 2013, CHI.

[94]  Per Bak,et al.  Small Worlds: The Dynamics of Networks between Order and Randomness, by Duncan J. Watts , 2000 .

[95]  Leo Katz,et al.  A new status index derived from sociometric analysis , 1953 .

[96]  Arne Koopman Characteristic relational patterns , 2009, KDD.

[97]  Christos Faloutsos,et al.  Sampling from large graphs , 2006, KDD '06.

[98]  Rex E. Jung,et al.  MIGRAINE: MRI Graph Reliability Analysis and Inference for Connectomics , 2013, 2013 IEEE Global Conference on Signal and Information Processing.

[99]  Hans-Peter Kriegel,et al.  Metropolis Algorithms for Representative Subgraph Sampling , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[100]  Shinji Umeyama,et al.  An Eigendecomposition Approach to Weighted Graph Matching Problems , 1988, IEEE Trans. Pattern Anal. Mach. Intell..

[101]  Nisheeth Shrivastava,et al.  Graph summarization with bounded error , 2008, SIGMOD Conference.

[102]  Mark E. J. Newman A measure of betweenness centrality based on random walks , 2005, Soc. Networks.

[103]  Jian Pei,et al.  Neighbor query friendly compression of social networks , 2010, KDD.

[104]  Sebastiano Vigna,et al.  The webgraph framework I: compression techniques , 2004, WWW '04.

[105]  Christos Faloutsos,et al.  Fully automatic cross-associations , 2004, KDD.

[106]  Mario Vento,et al.  Thirty Years Of Graph Matching In Pattern Recognition , 2004, Int. J. Pattern Recognit. Artif. Intell..

[107]  Jerry L Prince,et al.  Magnetic Resonance Connectome Automated Pipeline: An Overview , 2012, IEEE Pulse.

[108]  Ananthram Swami,et al.  Com2: Fast Automatic Discovery of Temporal ('Comet') Communities , 2014, PAKDD.

[109]  Danai Koutra,et al.  Net-Ray: Visualizing and Mining Billion-Scale Graphs , 2014, PAKDD.

[110]  Srinivasan Parthasarathy,et al.  Discovering frequent topological structures from graph datasets , 2005, KDD '05.

[111]  Pedro F. Felzenszwalb,et al.  Efficient belief propagation for early vision , 2004, CVPR 2004.

[112]  Michalis Faloutsos,et al.  Gelling, and melting, large graphs by edge manipulation , 2012, CIKM.

[113]  Lawrence B. Holder,et al.  Substructure Discovery Using Minimum Description Length and Background Knowledge , 1993, J. Artif. Intell. Res..

[114]  George C. Verghese,et al.  Graph similarity scoring and matching , 2008, Appl. Math. Lett..

[115]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[116]  Danai Koutra,et al.  VOG: Summarizing and Understanding Large Graphs , 2014, SDM.

[117]  Jonathan C. Roberts,et al.  Visual comparison for information visualization , 2011, Inf. Vis..

[118]  Jiawei Han,et al.  gSpan: graph-based substructure pattern mining , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[119]  Alberto Apostolico,et al.  Graph Compression by BFS , 2009, Algorithms.

[120]  Jon Kleinberg,et al.  Authoritative sources in a hyperlinked environment , 1999, SODA '98.

[121]  Christos Faloutsos,et al.  PEGASUS: A Peta-Scale Graph Mining System Implementation and Observations , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[122]  Bonnie Berger,et al.  Pairwise Global Alignment of Protein Interaction Networks by Matching Neighborhood Topology , 2007, RECOMB.

[123]  S. Wasserman,et al.  Blockmodels: Interpretation and evaluation , 1992 .

[124]  Vitaly Shmatikov,et al.  De-anonymizing Social Networks , 2009, 2009 30th IEEE Symposium on Security and Privacy.

[125]  William W. Cohen,et al.  Learning to rank typed graph walks: local and global approaches , 2007, WebKDD/SNA-KDD '07.

[126]  Jilles Vreeken,et al.  Krimp: mining itemsets that compress , 2011, Data Mining and Knowledge Discovery.

[127]  Christian Böhm,et al.  Compression-Based Graph Mining Exploiting Structure Primitives , 2013, 2013 IEEE 13th International Conference on Data Mining.

[128]  Jilles Vreeken,et al.  The long and the short of it: summarising event sequences with serial episodes , 2012, KDD.

[129]  Dmitry M. Malioutov,et al.  Walk-Sums and Belief Propagation in Gaussian Graphical Models , 2006, J. Mach. Learn. Res..

[130]  N. Christakis,et al.  The Spread of Obesity in a Large Social Network Over 32 Years , 2007, The New England journal of medicine.

[131]  Vipin Kumar,et al.  Multilevel k-way hypergraph partitioning , 1999, DAC '99.

[132]  Danai Koutra,et al.  Linearized and Single-Pass Belief Propagation , 2014, Proc. VLDB Endow..

[133]  M. Fiedler Algebraic connectivity of graphs , 1973 .

[134]  J. Rissanen,et al.  Modeling By Shortest Data Description* , 1978, Autom..

[135]  Erhard Rahm,et al.  Similarity flooding: a versatile graph matching algorithm and its application to schema matching , 2002, Proceedings 18th International Conference on Data Engineering.

[136]  Rajmonda Sulo Caceres,et al.  Temporal Scale of Processes in Dynamic Networks , 2011, 2011 IEEE 11th International Conference on Data Mining Workshops.

[137]  Alfred O. Hero,et al.  Tracking Communities in Dynamic Social Networks , 2011, SBP.

[138]  Michalis Faloutsos,et al.  A simple conceptual model for the Internet topology , 2001, GLOBECOM'01. IEEE Global Telecommunications Conference (Cat. No.01CH37270).

[139]  Mingzhe Wang,et al.  LINE: Large-scale Information Network Embedding , 2015, WWW.

[140]  Christos Faloutsos,et al.  SNARE: a link analytic system for graph labeling and risk detection , 2009, KDD.

[141]  Ying Wang,et al.  Message-Passing Algorithms for Sparse Network Alignment , 2009, TKDD.

[142]  Jure Leskovec,et al.  Statistical properties of community structure in large social and information networks , 2008, WWW.

[143]  Hans-Peter Kriegel,et al.  Shortest-path kernels on graphs , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[144]  David F. Gleich,et al.  Using Local Spectral Methods to Robustify Graph-Based Learning Algorithms , 2015, KDD.

[145]  Dániel Fogaras,et al.  Towards Scaling Fully Personalized PageRank , 2004, WAW.

[146]  S. Shen-Orr,et al.  Network motifs: simple building blocks of complex networks. , 2002, Science.

[147]  Inderjit S. Dhillon,et al.  A fast kernel-based multilevel algorithm for graph clustering , 2005, KDD '05.

[148]  Jimeng Sun,et al.  Fast Random Walk Graph Kernel , 2012, SDM.

[149]  Hector Garcia-Molina,et al.  Web graph similarity for anomaly detection , 2010, Journal of Internet Services and Applications.

[150]  Jure Leskovec,et al.  node2vec: Scalable Feature Learning for Networks , 2016, KDD.

[151]  Zoran Levnajic,et al.  Revealing the Hidden Language of Complex Networks , 2014, Scientific Reports.

[152]  M E J Newman,et al.  Finding and evaluating community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[153]  S. Shen-Orr,et al.  Superfamilies of Evolved and Designed Networks , 2004, Science.

[154]  Steven Gold,et al.  A Graduated Assignment Algorithm for Graph Matching , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[155]  Chris H. Q. Ding,et al.  Nonnegative Matrix Factorization for Combinatorial Optimization: Spectral Clustering, Graph Matching, and Clique Finding , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[156]  Danai Koutra,et al.  DeltaCon: Principled Massive-Graph Similarity Function with Attribution , 2016, ACM Trans. Knowl. Discov. Data.

[157]  Christos Faloutsos,et al.  Netprobe: a fast and scalable system for fraud detection in online auction networks , 2007, WWW '07.

[158]  A-L Barabási,et al.  Structure and tie strengths in mobile communication networks , 2006, Proceedings of the National Academy of Sciences.

[159]  Wenwu Zhu,et al.  Structural Deep Network Embedding , 2016, KDD.

[160]  Danai Koutra,et al.  BIG-ALIGN: Fast Bipartite Graph Alignment , 2013, 2013 IEEE 13th International Conference on Data Mining.

[161]  M. Zaslavskiy,et al.  A Path Following Algorithm for the Graph Matching Problem , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[162]  William T. Freeman,et al.  Constructing free-energy approximations and generalized belief propagation algorithms , 2005, IEEE Transactions on Information Theory.

[163]  Kumar Sricharan,et al.  Localizing anomalous changes in time-evolving graphs , 2014, SIGMOD Conference.

[164]  Danai Koutra,et al.  Unifying Guilt-by-Association Approaches: Theorems and Fast Algorithms , 2011, ECML/PKDD.

[165]  Charu C. Aggarwal,et al.  Evolutionary Network Analysis , 2014, ACM Comput. Surv..

[166]  Ying Wang,et al.  Algorithms for Large, Sparse Network Alignment Problems , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[167]  Aniket Kittur,et al.  Apolo: making sense of large network data by combining rich user interaction and machine learning , 2011, CHI.

[168]  Danai Koutra,et al.  Summarizing and understanding large graphs , 2015, Stat. Anal. Data Min..

[169]  Edoardo M. Airoldi,et al.  Mixed Membership Stochastic Blockmodels , 2007, NIPS.

[170]  N. Christakis,et al.  Dynamic spread of happiness in a large social network: longitudinal analysis over 20 years in the Framingham Heart Study , 2008, BMJ : British Medical Journal.

[171]  Martin Rosvall,et al.  An information-theoretic framework for resolving community structure in complex networks , 2007, Proceedings of the National Academy of Sciences.

[172]  Yair Weiss,et al.  Correctness of Local Probability Propagation in Graphical Models with Loops , 2000, Neural Computation.

[173]  Christos Faloutsos,et al.  It's who you know: graph mining using recursive structural features , 2011, KDD.

[174]  Christoph Schnörr,et al.  Probabilistic Subgraph Matching Based on Convex Relaxation , 2005, EMMCVPR.

[175]  Christos Faloutsos,et al.  Mining large graphs: Algorithms, inference, and discoveries , 2011, 2011 IEEE 27th International Conference on Data Engineering.