Hermes: clustering users in large-scale e-mail services

Hermes is an optimization engine for large-scale enterprise e-mail services. Such services could be hosted by a virtualized e-mail service provider, or by dedicated enterprise data centers. In both cases we observe that the pattern of e-mails between employees of an enterprise forms an implicit social graph. Hermes tracks this implicit social graph, periodically identifies clusters of strongly connected users within the graph, and co-locates such users on the same server. Co-locating the users reduces storage requirements: senders and recipients who reside on the same server can share a single copy of an e-mail. Co-location also reduces inter-server bandwidth usage. We evaluate Hermes using a trace of all e-mails within a major corporation over a five month period. The e-mail service supports over 120,000 users on 68 servers. Our evaluation shows that using Hermes results in storage savings of 37% and bandwidth savings of 50% compared to current approaches. The overheads are low: a single commodity server can run the optimization for the entire system.

[1]  Lada A. Adamic,et al.  How to search a social network , 2005, Soc. Networks.

[2]  Bruce Hendrickson,et al.  A Multi-Level Algorithm For Partitioning Graphs , 1995, Proceedings of the IEEE/ACM SC95 Conference.

[3]  Mahadev Satyanarayanan,et al.  Opportunistic Use of Content Addressable Storage for Distributed File Systems , 2003, USENIX Annual Technical Conference, General Track.

[4]  Geoffrey H. Kuenning,et al.  The Design of the SEER Predictive Caching System , 1994, 1994 First Workshop on Mobile Computing Systems and Applications.

[5]  Vijay V. Vazirani,et al.  Approximation Algorithms , 2001, Springer Berlin Heidelberg.

[6]  Ravi Kumar,et al.  Structure and evolution of online social networks , 2006, KDD '06.

[7]  Vipin Kumar,et al.  A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs , 1998, SIAM J. Sci. Comput..

[8]  Krishna P. Gummadi,et al.  Ostra: Leveraging Trust to Thwart Unwanted Communication , 2008, NSDI.

[9]  Craig A. N. Soules,et al.  Connections: using context to enhance file search , 2005, SOSP '05.

[10]  Jure Leskovec,et al.  The dynamics of viral marketing , 2005, EC '06.

[11]  William J. Bolosky,et al.  Single Instance Storage in Windows , 2000 .

[12]  Satish Rao,et al.  Geometry, flows, and graph-partitioning algorithms , 2008, Commun. ACM.

[13]  Jon M. Kleinberg,et al.  The structure of information pathways in a social communication network , 2008, KDD.

[14]  Krishna P. Gummadi,et al.  Measurement and analysis of online social networks , 2007, IMC '07.

[15]  Krishna P. Gummadi,et al.  Exploiting Social Networks for Internet Search , 2006, HotNets.

[16]  Thomas Karagiannis,et al.  WWW 2009 MADRID! Track: Social Networks and Web 2.0 / Session: Diffusion and Search in Social Networks Behavioral Profiles for Advanced Email Features , 2022 .

[17]  Kai Li,et al.  Avoiding the Disk Bottleneck in the Data Domain Deduplication File System , 2008, FAST.

[18]  Krishna P. Gummadi,et al.  A measurement-driven analysis of information propagation in the flickr social network , 2009, WWW '09.

[19]  Jean-Pierre Eckmann,et al.  Entropy of dialogues creates coherent structures in e-mail traffic. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[20]  William J. Bolosky,et al.  Single instance storage in Windows® 2000 , 2000 .

[21]  Hawoong Jeong,et al.  Comparison of online social relations in volume vs interaction: a case study of cyworld , 2008, IMC '08.

[22]  Marvin Theimer,et al.  Reclaiming space from duplicate files in a serverless distributed file system , 2002, Proceedings 22nd International Conference on Distributed Computing Systems.

[23]  Jack Dongarra,et al.  Sourcebook of parallel computing , 2003 .

[24]  Gueorgi Kossinets,et al.  Empirical Analysis of an Evolving Social Network , 2006, Science.

[25]  Chen-Nee Chuah,et al.  Unveiling facebook: a measurement study of social network based applications , 2008, IMC '08.

[26]  Pablo Rodriguez,et al.  I tube, you tube, everybody tubes: analyzing the world's largest user generated content video system , 2007, IMC '07.