Spam consists of varieties of contents like text, image, embedded HTML, MIME attachments and also the volume of spam mails sent per day is massive. To handle this high volume, high velocity and large varieties of spam, a scalable spam filtering solution is required. Scalable solutions available for machine learning and statistical studies can be used to implement a scalable solution for spam filtering also. From Big data Analytics domain, Mahout is an open source library from Apache for building scalable solutions in machine learning. This paper uses mahout framework to analyse the time and accuracy efficiencies of the results of two Naive Bayes classification algorithms.
Keywords: Apache Mahout, big data, scalable algorithms, Naive Bayes algorithms
[1]
Sean Owen,et al.
Collaborative Filtering with Apache Mahout
,
2012
.
[2]
Sean Owen,et al.
Mahout in Action
,
2011
.
[3]
Kishor Sadafale,et al.
An online recommendation system for e-commerce based on apache mahout framework
,
2013,
SIGMIS-CPR '13.
[4]
Grant Ingersoll,et al.
Introducing Apache Mahout Scalable , commercial-friendly machine learning for building intelligent applications
,
2017
.
[5]
Chunming Rong,et al.
Using Mahout for Clustering Wikipedia's Latest Articles: A Comparison between K-means and Fuzzy C-means in the Cloud
,
2011,
2011 IEEE Third International Conference on Cloud Computing Technology and Science.