Spam Mail Detection Using Data Mining: A Comparative Analysis

In the era of digitization communication, the commercial transaction takes place through the Web, email and be one of the most authoritative and fastest forms of communication, email fame has led to the scratchy email spam upload. The ensuing increase in superfluous and unsolicited spam received through email not only increases network communication and memory space, but it also becomes severe security intimidation for the end user. Automatic spam filtering is a promising and worthy research area where extensive works have been reported for the cataloging of email spam, but none of the methodologies guarantees complete solutions. Due to the rapid expansion of digital data, knowledge discovery and data mining have engrossed much attention with an imminent need to turn that data into useful information and knowledge. In this paper, authors have focused on how email communications are affected by spam and focus on various classification-based data mining techniques in a spam data set to identify spam and ham to analyze the performance of all classifiers and identify the best classifiers in terms of performance. To carry out the purpose of the work, an open source WEKA data mining tool has used to explore the performance analysis of the different classifiers and finally the superlative classifier has identified for the classification of email spam and has developed the knowledge flow model.