The number of spoofed emails is increasing rapidly and become a serious problem, especially in business and e-commerce. Sender domain authentication is an effective countermeasure for spoofed e-mail. Although SPF, DKIM, and DMARC are famous sender domain authentication methods, these methods erroneously determine legitimate e-mails as malicious e-mails, such as forwarded messages. On the other hand, DMARC has a reporting function, which e-mail senders can receive DMARC reports that include SPF and DKIM authentication results, and the sender's domains, and so on. Generally, spam e-mails countermeasures are combined with three approaches: TCP/SMTP session monitoring, sender domain authentication, and contents filtering. Since sender domain authentication is usually processed before contents filtering, the occurrence of many false positives in sender domain authentication is a serious problem. In this paper, we propose a method to detect legitimate IP addresses by adapting X-means clustering to DMARC reports data in order to detect false positive deliveries in sender domain authentications. We apply actual DMARC reports data received from 28th September to 5th October 2019 to our approach. As a result, our method classified 254 to 480 IP addresses per day as legitimate addresses. As an evaluation, we confirmed that 2.8% to 11.1% of e-mails from legitimate IP addresses detected by our method were failed the combination of SPF or DKIM verification, and 36.9% to 62.7% of them were failed to DMARC authentication. From these results, we confirmed the proposed method can detect false positive deliveries caused by conventional sender domain authentication with high accuracy.
[1]
Qing Yang,et al.
A support vector machine based naive Bayes algorithm for spam filtering
,
2016,
2016 IEEE 35th International Performance Computing and Communications Conference (IPCCC).
[2]
Murray S. Kucherawy,et al.
DomainKeys Identified Mail (DKIM) Signatures
,
2011,
RFC.
[3]
Taghi M. Khoshgoftaar,et al.
Survey of review spam detection using machine learning techniques
,
2015,
Journal of Big Data.
[4]
Meng Weng Wong,et al.
Sender Policy Framework (SPF) for Authorizing Use of Domains in E-Mail, Version 1
,
2006,
RFC.
[5]
Vishal Kumar,et al.
Identification and Detection of Phishing Emails Using Natural Language Processing Techniques
,
2014,
SIN.
[6]
Farnam Jahanian,et al.
Shades of grey: On the effectiveness of reputation-based “blacklists”
,
2008,
2008 3rd International Conference on Malicious and Unwanted Software (MALWARE).
[7]
Andrew W. Moore,et al.
X-means: Extending K-means with Efficient Estimation of the Number of Clusters
,
2000,
ICML.
[8]
Murray S. Kucherawy,et al.
Domain-based Message Authentication, Reporting, and Conformance (DMARC)
,
2015,
RFC.
[9]
Hiroki Takakura,et al.
An anti-spam method via real-time retransmission detection
,
2012,
2012 18th IEEE International Conference on Networks (ICON).
[10]
Constantine D. Spyropoulos,et al.
An experimental comparison of naive Bayesian and keyword-based anti-spam filtering with personal e-mail messages
,
2000,
SIGIR '00.