A graph model based author attribution technique for single-class e-mail classification

Electronic mails have increasingly replaced all written modes of communications for important correspondences including personal and business transactions. An e-mail is given equal significance as a signed document. Hence email impersonation through compromised accounts has become a major threat. In this paper, we have proposed an email style acquisition and classification model for authorship attribution that serves as an effective tool to prevent and detect email impersonation. The proposed model gains knowledge of the author's email style by being trained only with the sample email texts of the author and then identifies if a given email text is a legitimate email of the author or not. Extracting the significant features that represent an author's style from the available concise emails is a big challenge in email authorship attribution. We have proposed to use a graph-based model to precisely extract the unique feature set of the author. We have used one-class SVM classifier to deal with the single-class sample data that consists of only true positive samples. Two classification models have been designed and compared. The first one is a probability model which is based on the probability of occurrence of a feature in the specific email. The second technique is based on inclusive compound probability of a feature to appear in a sentence of an email. Both the models have been evaluated against the public Enron dataset.