Electronic mails have increasingly replaced all written modes of communications for important correspondences including personal and business transactions. An e-mail is given equal significance as a signed document. Hence email impersonation through compromised accounts has become a major threat. In this paper, we have proposed an email style acquisition and classification model for authorship attribution that serves as an effective tool to prevent and detect email impersonation. The proposed model gains knowledge of the author's email style by being trained only with the sample email texts of the author and then identifies if a given email text is a legitimate email of the author or not. Extracting the significant features that represent an author's style from the available concise emails is a big challenge in email authorship attribution. We have proposed to use a graph-based model to precisely extract the unique feature set of the author. We have used one-class SVM classifier to deal with the single-class sample data that consists of only true positive samples. Two classification models have been designed and compared. The first one is a probability model which is based on the probability of occurrence of a feature in the specific email. The second technique is based on inclusive compound probability of a feature to appear in a sentence of an email. Both the models have been evaluated against the public Enron dataset.
[1]
Abraham Kandel,et al.
Classification Of Web Documents Using Graph Matching
,
2004,
Int. J. Pattern Recognit. Artif. Intell..
[2]
Fabrizio Sebastiani,et al.
Machine learning in automated text categorization
,
2001,
CSUR.
[3]
Mehmet Can,et al.
Authorship Attribution Using Principal Component Analysis And Competitive Neural Networks
,
2014
.
[4]
Khaled Rasheed,et al.
Using Machine Learning Techniques for Stylometry
,
2004,
IC-AI.
[5]
Malik Yousef,et al.
One-Class SVMs for Document Classification
,
2002,
J. Mach. Learn. Res..
[6]
Yiming Yang,et al.
Introducing the Enron Corpus
,
2004,
CEAS.
[7]
Robert Goodman,et al.
The Use of Stylometry for Email Author Identification: A Feasibility Study
,
2007
.
[8]
Jim Webber,et al.
A programmatic introduction to Neo4j
,
2018,
SPLASH '12.
[9]
Malik Yousef,et al.
One-class document classification via Neural Networks
,
2007,
Neurocomputing.
[10]
Chih-Jen Lin,et al.
LIBSVM: A library for support vector machines
,
2011,
TIST.
[11]
D. Holmes.
The Evolution of Stylometry in Humanities Scholarship
,
1998
.
[12]
M. Connors,et al.
Stylometry for E-mail Author Identification and Authentication
,
2008
.