Traffic Characteristics and Communication Patterns in Blogosphere

We present a thorough characterization of the access patterns in blogspace – a fast-growing constituent of the content available through the Internet – which comprises a rich interconnected web of blog postings and comments by an increasingly prominent user community that collectively define what has become known as the blogosphere. Our characterization of over 35 million read, write, and administrative requests spanning a 28-day period is done from three different blogosphere perspectives. The server view characterizes the aggregate access patterns of all users to all blogs; the user view characterizes how individual users interact with blogosphere objects (blogs); the object view characterizes how individual blogs are accessed. Our findings support two important conclusions. First, we show that the nature of interactions between users and objects is fundamentally different in blogspace than that observed in traditional web content. Access to objects in blogspace could be conceived as part of an interaction between an author and its readership. As we show in our work, such interactions range from one-to-many “broadcast-type” and many-to-one “registration-type” communication between an author and its readers, to multi-way, iterative “parlortype” dialogues among members of an interest group. This more-interactive nature of the blogosphere leads to interesting traffic and communication patterns, which are different from those observed in traditional web content. Second, we identify and characterize novel features of the blogosphere workload, and we investigate the similarities and differences between typical web server and blogosphere server workloads.

[1]  Timothy W. Finin,et al.  Characterizing the Splogosphere , 2006, WWW 2006.

[2]  Gilad Mishne,et al.  Leave a Reply: An Analysis of Weblog Comments , 2006 .

[3]  Sally Floyd,et al.  Wide-area traffic: the failure of Poisson modeling , 1994 .

[4]  Ramanathan V. Guha,et al.  Information diffusion through blogspace , 2004, WWW '04.

[5]  Christopher Olston,et al.  What's new on the web?: the evolution of the web from a search engine perspective , 2004, WWW '04.

[6]  Eytan Adar,et al.  Implicit Structure and the Dynamics of Blogspace , 2004 .

[7]  Martin Arlitt,et al.  Web Workload Characterization: Ten Years Later , 2005 .

[8]  Jun'ichi Tatemura,et al.  Discovering Important Bloggers based on Analyzing Blog Threads , 2005 .

[9]  Li Fan,et al.  Web caching and Zipf-like distributions: evidence and implications , 1999, IEEE INFOCOM '99. Conference on Computer Communications. Proceedings. Eighteenth Annual Joint Conference of the IEEE Computer and Communications Societies. The Future is Now (Cat. No.99CH36320).

[10]  Azer Bestavros,et al.  Self-similarity in World Wide Web traffic: evidence and possible causes , 1996, SIGMETRICS '96.

[11]  Mark Crovella,et al.  Characteristics of WWW Client-based Traces , 1995 .

[12]  Jean-Pierre Eckmann,et al.  Entropy of dialogues creates coherent structures in e-mail traffic. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[13]  Virgílio A. F. Almeida,et al.  A hierarchical characterization of a live streaming media workload , 2006, TNET.

[14]  Eric A. Brewer,et al.  System Design Issues for Internet Middleware Services: Deductions from a Large Client Trace , 1997, USENIX Symposium on Internet Technologies and Systems.

[15]  Edith Cohen,et al.  A short walk in the Blogistan , 2006, Comput. Networks.

[16]  Virgílio A. F. Almeida,et al.  A hierarchical characterization of a live streaming media workload , 2006 .

[17]  Virgílio A. F. Almeida,et al.  Characterizing reference locality in the WWW , 1996, Fourth International Conference on Parallel and Distributed Information Systems.

[18]  Inna Kouper,et al.  Conversations in the Blogosphere: An Analysis "From the Bottom Up" , 2005, Proceedings of the 38th Annual Hawaii International Conference on System Sciences.

[19]  Ravi Kumar,et al.  On the Bursty Evolution of Blogspace , 2003, WWW '03.

[20]  Azer Bestavros,et al.  Changes in Web client access patterns: Characteristics and caching implications , 1999, World Wide Web.

[21]  Ravi Kumar,et al.  Structure and evolution of blogspace , 2004, CACM.

[22]  Martin F. Arlitt,et al.  Web server workload characterization: the search for invariants , 1996, SIGMETRICS '96.