A Real-Life Study in Phishing Detection

Phishing is a serious threat to global security and economy. Previously we have developed a phishing ltering system based on automatic classication. We perform statistical ltering of emails, where a classier is trained on characteristic features of existing emails and subsequently is able to identify new phishing emails with dierent contents. In this work we test our developed system in a real-life environment at a commercial ISP. The system is applied to an unskewed real-life stream consisting of thousands of emails every day. We use active learning to keep the system’s model up-todate. The experiments show that the system performs very well as a lter even in the presence of many spam emails. We furthermore demonstrate that active learning is indeed useful and leads to better results than using a xed model. Last, we integrate the output of another spam lter into the system and show that this combined lter leads to better results than either lter by itself.