Fires on the Web: Towards Efficient Exploring Historical Web Graphs

Discovery of evolving regions in large graphs is an important issue because it is the basis of many applications such as spam websites detection in the Web, community lifecycle exploration in social networks, and so forth. In this paper, we aim to study a new problem, which explores the evolution process between two historic snapshots of an evolving graph. A formal definition of this problem is presented. The evolution process is simulated as a fire propagation scenario based on the Forest Fire Model (FFM) [17]. We propose two efficient solutions to tackle the issue which are grounded on the probabilistic guarantee. The experimental results show that our solutions are efficient with regard to the performance and effective on the well fitness of the major characteristics of evolving graphs.

[1]  David Liben-Nowell,et al.  The link-prediction problem for social networks , 2007 .

[2]  Nick Koudas,et al.  BlogScope: spatio-temporal analysis of the blogosphere , 2007, WWW '07.

[3]  Masaru Kitsuregawa,et al.  What's really new on the web?: identifying new pages from a series of unstable web snapshots , 2006, WWW '06.

[4]  Takashi Washio,et al.  A Fast Method to Mine Frequent Subsequences from Graph Sequence Data , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[5]  Jure Leskovec,et al.  Microscopic evolution of social networks , 2008, KDD.

[6]  Philip S. Yu,et al.  GraphScope: parameter-free mining of large time-evolving graphs , 2007, KDD '07.

[7]  Andrei Z. Broder,et al.  Graph structure in the Web , 2000, Comput. Networks.

[8]  Jeffrey Xu Yu,et al.  Spotting Significant Changing Subgraphs in Evolving Graphs , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[9]  Philippa Pattison,et al.  Dynamic Social Network Modelling and Analysis , 2003 .

[10]  Mark E. J. Newman,et al.  The Structure and Function of Complex Networks , 2003, SIAM Rev..

[11]  Boris E Shakhnovich,et al.  Quantifying structure-function uncertainty: a graph theoretical exploration into the origins and limitations of protein annotation. , 2004, Journal of molecular biology.

[12]  Frank Wm. Tompa,et al.  Seeking Stable Clusters in the Blogosphere , 2007, VLDB.

[13]  Jan Komorowski,et al.  Principles of Data Mining and Knowledge Discovery , 2001, Lecture Notes in Computer Science.

[14]  Peter G. Doyle,et al.  Random Walks and Electric Networks: REFERENCES , 1987 .

[15]  Tanya Y. Berger-Wolf,et al.  A framework for community identification in dynamic social networks , 2007, KDD '07.

[16]  Hector Garcia-Molina,et al.  Combating Web Spam with TrustRank , 2004, VLDB.

[17]  Kathleen M. Carley,et al.  Dynamic Social Network Modeling and Analysis: Workshop Summary and Papers , 2004 .

[18]  Michalis Faloutsos,et al.  A simple conceptual model for the Internet topology , 2001, GLOBECOM'01. IEEE Global Telecommunications Conference (Cat. No.01CH37270).

[19]  Christos Faloutsos,et al.  Graphs over time: densification laws, shrinking diameters and possible explanations , 2005, KDD '05.

[20]  Takashi Washio,et al.  An Apriori-Based Algorithm for Mining Frequent Substructures from Graph Data , 2000, PKDD.

[21]  Michalis Faloutsos,et al.  On power-law relationships of the Internet topology , 1999, SIGCOMM '99.

[22]  Eli Upfal,et al.  Stochastic models for the Web graph , 2000, Proceedings 41st Annual Symposium on Foundations of Computer Science.