e-mail has become an important means of electronic communication but the viability of its usage is marred by Unsolicited Bulk e-mail (UBE) messages. UBE consists of many types like pornographic, virus infected and 'cry-for-help' messages as well as fake and fraudulent offers for jobs, winnings and medicines. UBE poses technical and socio-economic challenges to usage of e-mails. To meet this challenge and combat this menace, we need to understand UBE. Towards this end, the current paper presents a content-based textual analysis of nearly 3000 winnings-announcing UBE. Technically, this is an application of Text Parsing and Tokenization for an un-structured textual document and we approach it using Bag Of Words (BOW) and Vector Space Document Model techniques. We have attempted to identify the most frequently occurring lexis in the winnings-announcing UBE documents. The analysis of such top 100 lexis is also presented. We exhibit the relationship between occurrence of a word from the identified lexisset in the given UBE and the probability that the given UBE will be the one announcing fake winnings. To the best of our knowledge and survey of related literature, this is the first formal attempt for identification of most frequently occurring lexis in winningsannouncing UBE by its textual analysis. Finally, this is a sincere attempt to bring about alertness against and mitigate the threat of such luring but fake UBE. Keywords—Lexis, Unsolicited Bulk e-mail (UBE), Vector Space Document Model, Winnings, Lottery
[1]
Dennis McLeod,et al.
Spam Email Classification using an Adaptive Ontology
,
2007,
J. Softw..
[2]
Fabrizio Sebastiani,et al.
Machine learning in automated text categorization
,
2001,
CSUR.
[3]
Luca Becchetti,et al.
A reference collection for web spam
,
2006,
SIGF.
[4]
Wojciech P. Gajewski,et al.
Adaptive Naïve Bayesian Anti-Spam Engine
,
2007,
IEC.
[5]
Stan Matwin,et al.
Email classification with co-training
,
2011,
CASCON.
[6]
Norman M. Sadeh,et al.
Learning to detect phishing emails
,
2007,
WWW '07.
[7]
Blaine Nelson,et al.
Analyzing Behavioral Features for Email Classification
,
2005,
CEAS.
[8]
Hector Garcia-Molina,et al.
Web Spam Taxonomy
,
2005,
AIRWeb.