Towards building a word similarity dictionary for personality bias classification of phishing email contents

Phishing attacks are a form of social engineering technique used for stealing private information from users through emails. A general approach for phishing susceptibility analysis is to profile the user's personality using personality models such as the Five Factor Model (FFM) and find out the susceptibility for a set of phishing attempts. The FFM is a personality profiling system that scores participants on five separate personality traits: openness to experience (O), conscientiousness (C), extraversion (E), agreeableness (A), and neuroticism (N). However, existing approaches don't take into account the fact that based on the content, for example, a phishing email offering an enticing free prize might be very effective on a dominant O-personality (curious, open to new experience), but not to an N-personality (tendency of experiencing negative emotion). Therefore, it is necessary to consider the personality bias of the phishing email contents during the susceptibility analysis. In this paper, we have proposed a method to construct a dictionary based on the semantic similarity of prospective words describing the FFM. Words generated through this dictionary can be used to label the phishing emails according to the personality bias and serve as the key component of a personality bias classification system of phishing emails. We have validated our dictionary construction using a large public corpus of phishing email data which shows the potential of the proposed system in anti-phishing research.

[1]  Dekang Lin,et al.  An Information-Theoretic Definition of Similarity , 1998, ICML.

[2]  Rakesh M. Verma,et al.  Detecting Phishing Emails the Natural Language Way , 2012, ESORICS.

[3]  David W. Conrath,et al.  Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy , 1997, ROCLING/IJCLCLP.

[4]  Dan Klein,et al.  Feature-Rich Part-of-Speech Tagging with a Cyclic Dependency Network , 2003, NAACL.

[5]  J. M. Digman PERSONALITY STRUCTURE: EMERGENCE OF THE FIVE-FACTOR MODEL , 1990 .

[6]  David B. Bracewell,et al.  Semi-Automatic WordNet Based Emotion Dictionary Construction , 2010, 2010 Ninth International Conference on Machine Learning and Applications.

[7]  Martin Chodorow,et al.  Combining local context and wordnet similarity for word sense identification , 1998 .

[8]  R. McCrae,et al.  An introduction to the five-factor model and its applications. , 1992, Journal of personality.

[9]  Nasir Memon,et al.  A pilot study of cyber security and privacy related behavior and personality traits , 2013, WWW.

[10]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[11]  Harry Wechsler,et al.  phishGILLNET—phishing detection methodology using probabilistic latent semantic analysis, AdaBoost, and co-training , 2012 .

[12]  Mark A. Finlayson Java Libraries for Accessing the Princeton Wordnet: Comparison and Evaluation , 2014, GWC.

[13]  Tara Whalen,et al.  A Psychological Profile of Defender Personality Traits , 2007, J. Comput..

[14]  Philip Resnik,et al.  Using Information Content to Evaluate Semantic Similarity in a Taxonomy , 1995, IJCAI.

[15]  Christiane Fellbaum,et al.  Combining Local Context and Wordnet Similarity for Word Sense Identification , 1998 .

[16]  Frank L. Greitzer,et al.  Identifying At-Risk Employees: Modeling Psychosocial Precursors of Potential Insider Threats , 2012, 2012 45th Hawaii International Conference on System Sciences.

[17]  Lorrie Faith Cranor,et al.  Cantina: a content-based approach to detecting phishing web sites , 2007, WWW '07.