Using graph partitioning to discover regions of correlated spatio-temporal change in evolving graphs

There is growing interest in studying dynamic graphs, or graphs that evolve with time. In this work, we investigate a new type of dynamic graph analysis - finding regions of a graph that are evolving in a similar manner and are topologically similar over a period of time. For example, these regions can be used to group a set of changes having a common cause in event detection and fault diagnosis. Prior work [6] has proposed a greedy framework called cSTAG to find these regions. It was accurate in datasets where the regions are temporally and spatially well separated. However, in cases where the regions are not well separated, cSTAG produces incorrect groupings. In this paper, we propose a new algorithm called regHunter. It treats the region discovery problem as a multi-objective optimisation problem, and it uses a multi-level graph partitioning algorithm to discover the regions of correlated change. In addition, we propose an external clustering validation technique, and use several existing internal measures to evaluate the accuracy of regHunter. Using synthetic datasets, we found regHunter is significantly more accurate than cSTAG in dynamic graphs that have regions with small separation. Using two real datasets - the access graph of the 1998 World Cup website, and the BGP connectivity graph during the landfall of Hurricane Katrina - we found regHunter obtained more accurate results than cSTAG. Furthermore, regHunter was able to discover two interesting regions for the World Cup access graph that CSTAG was not able to find.

[1]  Philip S. Yu,et al.  GraphScope: parameter-free mining of large time-evolving graphs , 2007, KDD '07.

[2]  Hongyuan Zha,et al.  A new Mallows distance based metric for comparing clusterings , 2005, ICML '05.

[3]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[4]  C. Ding,et al.  A MinMaxCut Spectral Method for Data Clustering and Graph Partitioning , 2003 .

[5]  Hans-Peter Kriegel,et al.  Pattern Mining in Frequent Dynamic Subgraphs , 2006, Sixth International Conference on Data Mining (ICDM'06).

[6]  James Bailey,et al.  Clustering Similarity Comparison Using Density Profiles , 2006, Australian Conference on Artificial Intelligence.

[7]  Matthew C. Caesar,et al.  Towards Localizing Root Causes of BGP Dynamics , 2003 .

[8]  Nello Cristianini,et al.  An introduction to Support Vector Machines , 2000 .

[9]  Malgorzata Steinder,et al.  Probabilistic fault localization in communication systems using belief networks , 2004, IEEE/ACM Transactions on Networking.

[10]  Horst Bunke,et al.  Detection of Abnormal Change in a Time Series of Graphs , 2002, J. Interconnect. Networks.

[11]  James Bailey,et al.  Discovering correlated spatio-temporal changes in evolving graphs , 2007, Knowledge and Information Systems.

[12]  Jacques Teghem,et al.  Multiple Criteria Optimization, State of the Art. Annotated Bibliographic Surveys, M. Ehrgott, X. Gandibleux (Eds.). Kluwer International Series (Operations Research and Management Science), Boston, Dordrecht, London (2002), (496pp.), ISBN: 1-4020-7128-0 , 2005 .

[13]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[14]  Anja Feldmann,et al.  Locating internet routing instabilities , 2004, SIGCOMM 2004.

[15]  David G. Luenberger,et al.  Linear and Nonlinear Programming: Second Edition , 2003 .

[16]  Inderjit S. Dhillon,et al.  A fast kernel-based multilevel algorithm for graph clustering , 2005, KDD '05.

[17]  Christos Faloutsos,et al.  Graphs over time: densification laws, shrinking diameters and possible explanations , 2005, KDD '05.

[18]  Stuart Clare,et al.  Functional MRI : Methods and Applications , 1997 .

[19]  Ravi Kumar,et al.  Structure and evolution of online social networks , 2006, KDD '06.

[20]  Vipin Kumar,et al.  A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs , 1998, SIAM J. Sci. Comput..

[21]  Takashi Washio,et al.  State of the art of graph-based data mining , 2003, SKDD.

[22]  Inderjit S. Dhillon,et al.  Kernel k-means: spectral clustering and normalized cuts , 2004, KDD.

[23]  Gene H. Golub,et al.  Matrix computations , 1983 .

[24]  Martin Arlitt,et al.  Workload Characterization of the 1998 World Cup Web Site , 1999 .

[25]  G. Karypis,et al.  Multilevel k-way hypergraph partitioning , 1999, Proceedings 1999 Design Automation Conference (Cat. No. 99CH36361).

[26]  Carl D. Meyer,et al.  Matrix Analysis and Applied Linear Algebra , 2000 .

[27]  Michalis Vazirgiannis,et al.  c ○ 2001 Kluwer Academic Publishers. Manufactured in The Netherlands. On Clustering Validation Techniques , 2022 .

[28]  Shashi Shekhar,et al.  Mixed-Drove Spatio-Temporal Co-occurrence Pattern Mining : A Summary of Results , 2006 .

[29]  Ravi Kumar,et al.  On the Bursty Evolution of Blogspace , 2003, WWW '03.

[30]  Nick Feamster,et al.  Some Foundational Problems in Interdomain Routing , 2004 .