Analysis of Enterprise Email Big Data to Detect Careless Email Activities That May Cause Security Problems

In modern business, email has become the most commonly used means of communication whose popularity is attributed to its simplicity of usage and low cost. However, there have occurred a lot of cases where a business’s security got in trouble by a worker’s careless email use. This paper suggests a method to help detect such problematic use of email by analyzing email data. The method is designed to find email messages that do not seem to have asked for a reply but were replied by someone, which we suggest can be a clue to the writer’s carelessness, and involves document-vectorization through basic bag-of-words model and word2vec technique, which is a state-of-the-art method to create document vectors out of text documents. Enron email dataset was used as input data for an experiment which shows overall classification results. The results show which email messages are to be watched first to find careless email messages.

[1]  Yiming Yang,et al.  Introducing the Enron Corpus , 2004, CEAS.