A study of e-mail patterns

Although electronic mail is an increasingly important service, there are few empirical studies of e-mail traffic. We have observed over 2.85 million messages passing through our departmental servers over the course of seven months, and derived distributions that approximate several important e-mail parameters including message sizes, message senders and receivers and the burstiness of message deliveries. Our work is unique in that we also analyse message payloads: attachment content types, e-mail redundancy, and the use of e-mail as a sharing mechanism. These data can be used in developing e-mail workloads for mail system engineering or benchmarking. To this end, we provide an improved version of Postmark, a small-file Internet benchmark, that better approximates mail server characteristics. Copyright © 2007 John Wiley & Sons, Ltd.

[1]  Nathaniel S. Borenstein,et al.  MIME (Multipurpose Internet Mail Extensions) Part One: Mechanisms for Specifying and Describing the Format of Internet Message Bodies , 1992, RFC.

[2]  Murad S. Taqqu,et al.  On the Self-Similar Nature of Ethernet Traffic , 1993, SIGCOMM.

[3]  Marshall T. Rose,et al.  Post Office Protocol - Version 3 , 1988, RFC.

[4]  Virgílio A. F. Almeida,et al.  Characterizing a spam traffic , 2004, IMC '04.

[5]  Virgílio A. F. Almeida,et al.  On the intrinsic locality properties of Web reference streams , 2003, IEEE INFOCOM 2003. Twenty-second Annual Joint Conference of the IEEE Computer and Communications Societies (IEEE Cat. No.03CH37428).

[6]  Brian N. Bershad,et al.  Manageability, availability and performance in Porcupine: a highly scalable, cluster-based mail service , 1999, TOCS.

[7]  Mark R. Crispin Internet Message Access Protocol - Version 4rev1 , 1996, RFC.

[8]  Azer Bestavros,et al.  Self-similarity in World Wide Web traffic: evidence and possible causes , 1996, SIGMETRICS '96.

[9]  Stephen J. Wright,et al.  Numerical Optimization , 2018, Fundamental Statistical Inference.

[10]  C. Peng,et al.  Mosaic organization of DNA nucleotides. , 1994, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.

[11]  Mariacarla Calzarossa,et al.  Models of mail server workloads , 2001, Perform. Evaluation.

[12]  Gennady Samorodnitsky,et al.  Variable heavy tailed durations in Internet traffic. Part I. Understanding heavy tails , 2002, Proceedings. 10th IEEE International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunications Systems.

[13]  Udi Manber,et al.  Finding Similar Files in a Large File System , 1994, USENIX Winter.

[14]  Jon Postel,et al.  Simple Mail Transfer Protocol , 1981, RFC.

[15]  Eric A. Brewer,et al.  Self-similarity in file systems , 1998, SIGMETRICS '98/PERFORMANCE '98.

[16]  Murad S. Taqqu,et al.  On estimating the intensity of long-range dependence in finite and infinite variance time series , 1998 .