Authorship Similarity Detection from Email Messages

It is easy to hide the true identity of the author of an email. The author's actual name, email address, etc. can be changed arbitrarily to deceive an email receiver. For example, a sender can change his/her identity in the email header to send different emails to various recipients. Therefore, in this paper, we investigate techniques for authorship similarity detection from the text content of a short length, topic-free email. 150 stylistic cues are identified for this problem. A frequent pattern and machine learning based method is proposed. Extensive experiment results are also presented for the Enron email data set.

[1]  Rajarathnam Chandramouli,et al.  Gender identification from E-mails , 2009, 2009 IEEE Symposium on Computational Intelligence and Data Mining.

[2]  Ophir Frieder,et al.  Repeatable evaluation of search services in dynamic environments , 2007, TOIS.

[3]  Hsinchun Chen,et al.  Visualizing Authorship for Identification , 2006, ISI.

[4]  Hsinchun Chen,et al.  A framework for authorship identification of online messages: Writing-style features and classification techniques , 2006 .

[5]  Dale Schuurmans,et al.  Language independent authorship attribution using character level language models , 2003, Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - EACL '03.

[6]  Robert Goodman,et al.  The Use of Stylometry for Email Author Identification: A Feasibility Study , 2007 .

[7]  Haibo He,et al.  Learning from Imbalanced Data , 2009, IEEE Transactions on Knowledge and Data Engineering.

[8]  Olivier de Vel,et al.  Mining E-mail Authorship , 2000 .

[9]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[10]  Hsinchun Chen,et al.  Writeprints: A stylometric approach to identity-level identification and similarity detection in cyberspace , 2008, TOIS.

[11]  George M. Mohay,et al.  Identifying the authors of suspect email , 2001 .

[12]  Benjamin C. M. Fung,et al.  A novel approach of mining write-prints for authorship attribution in e-mail forensics , 2008, Digit. Investig..