Detecting Viral Propagations Using Email Behavior Profiles

The Email Mining Toolkit (EMT) is a data mining system that computes behavior profiles or models of user email accounts. These models may be used for a variety of forensic analyses and detection tasks. In this paper we focus on the application of these models to detect the early onset of a viral propagation without "contentbased" (or signature-based) analysis in common use in virus scanners. We present several experiments using real email from 15 users with injected simulated viral emails and describe how the combination of different behavior models improves overall detection rates. The performance results vary depending upon parameter settings, approaching 99% true positive(TP) (percentage of viral emails caught) in general cases and with 0.38% false positive(FP) (percentage of emails with attachments that are mislabeled as viral). The models used for this study are based upon volume and velocity statistics of a user’s email rate and an analysis of the user’s (social) cliques revealed in their email behavior. We show by way of simulation that virus propagations are detectable since viruses may emit emails at rates different than human behavior suggests is normal, and email is directed to groups of recipients that violates the user’s typical communication with their social groups.

[1]  DAVID G. KENDALL,et al.  Introduction to Mathematical Statistics , 1947, Nature.

[2]  C. Bron,et al.  Algorithm 457: finding all cliques of an undirected graph , 1973 .

[3]  Coenraad Bron,et al.  Finding all cliques of an undirected graph , 1973 .

[4]  Dorothy E. Denning,et al.  An Intrusion-Detection Model , 1987, IEEE Transactions on Software Engineering.

[5]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[6]  Christos Faloutsos,et al.  QBIC project: querying images by content, using color, texture, and shape , 1993, Electronic Imaging.

[7]  Harold S. Javitz,et al.  The NIDES Statistical Component Description and Justification , 1994 .

[8]  M Damashek,et al.  Gauging Similarity with n-Grams: Language-Independent Categorization of Text , 1995, Science.

[9]  Pat Langley,et al.  Estimating Continuous Distributions in Bayesian Classifiers , 1995, UAI.

[10]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[11]  Philip K. Chan,et al.  Learning Patterns from Unix Process Execution Traces for Intrusion Detection , 1997 .

[12]  Stephanie Forrest,et al.  Intrusion Detection Using Sequences of System Calls , 1998, J. Comput. Secur..

[13]  Salvatore J. Stolfo,et al.  Mining Audit Data to Build Intrusion Detection Models , 1998, KDD.

[14]  Carla E. Brodley,et al.  Temporal sequence learning and data reduction for anomaly detection , 1998, CCS '98.

[15]  Salvatore J. Stolfo,et al.  Distributed data mining in credit card fraud detection , 1999, IEEE Intell. Syst..

[16]  Michael Schatz,et al.  Learning Program Behavior Profiles for Intrusion Detection , 1999, Workshop on Intrusion Detection and Network Monitoring.

[17]  Barak A. Pearlmutter,et al.  Detecting intrusions using system calls: alternative data models , 1999, Proceedings of the 1999 IEEE Symposium on Security and Privacy (Cat. No.99CB36344).

[18]  Salvatore J. Stolfo,et al.  A framework for constructing features and models for intrusion detection systems , 2000, TSEC.

[19]  Eleazar Eskin,et al.  Anomaly Detection over Noisy Data using Learned Probability Distributions , 2000, ICML.

[20]  Nong Ye,et al.  A Markov Chain Model of Temporal Behavior for Anomaly Detection , 2000 .

[21]  A. Karr,et al.  Computer Intrusion: Detecting Masquerades , 2001 .

[22]  Dong Xiang,et al.  Information-theoretic measures for anomaly detection , 2001, Proceedings 2001 IEEE Symposium on Security and Privacy. S&P 2001.

[23]  Jim Alves-Foss,et al.  NATE: Network Analysis of Anomalous Traffic Events, a low-cost approach , 2001, NSPW '01.

[24]  Christos Faloutsos,et al.  The "DGX" distribution for mining massive, skewed data , 2001, KDD '01.

[25]  Philip K. Chan,et al.  Detecting novel attacks by identifying anomalous network packet headers , 2001 .

[26]  Salvatore J. Stolfo,et al.  USENIX Association Proceedings of the FREENIX Track : 2001 USENIX Annual , 2001 .

[27]  J. Kleinberg Bursty and Hierarchical Structure in Streams , 2002, Data mining and knowledge discovery.

[28]  Eleazar Eskin,et al.  A GEOMETRIC FRAMEWORK FOR UNSUPERVISED ANOMALY DETECTION: DETECTING INTRUSIONS IN UNLABELED DATA , 2002 .

[29]  Salvatore J. Stolfo,et al.  Detecting Malicious Software by Monitoring Anomalous Windows Registry Accesses , 2002, RAID.

[30]  Kymie M. C. Tan,et al.  "Why 6?" Defining the operational limits of stide, an anomaly-based intrusion detector , 2002, Proceedings 2002 IEEE Symposium on Security and Privacy.

[31]  Matthew M. Williamson,et al.  Throttling viruses: restricting propagation to defeat malicious mobile code , 2002, 18th Annual Computer Security Applications Conference, 2002. Proceedings..

[32]  Stephanie Forrest,et al.  Email networks and the spread of computer viruses. , 2002, Physical review. E, Statistical, nonlinear, and soft matter physics.

[33]  Eleazar Eskin,et al.  MET: an experimental system for Malicious Email Tracking , 2002, NSPW '02.

[34]  David A. Wagner,et al.  Mimicry attacks on host-based intrusion detection systems , 2002, CCS '02.

[35]  Duncan J. Watts,et al.  Six Degrees: The Science of a Connected Age , 2003 .

[36]  Jon M. Kleinberg,et al.  Bursty and Hierarchical Structure in Streams , 2002, Data Mining and Knowledge Discovery.