A rule-based system for end-user e-mail annotations

A new system for spam e-mail annotation by end-users is presented. It is based on the recursive application of handwritten annotation rules by means of an inferential engine based on Logic Programming. Annotation rules allow the user to express nuanced considerations that depend on deobfuscation, word (non-)occurrence and structure of the message in a straightforward, human-readable syntax. We show that a sample collection of annotation rules are effective on a relevant corpus that we have assembled by collecting emails that have escaped detection by the industry-standard SpamAssassin filter. The system presented here is intended as a personal tool enforcing personalized annotation rules that would not be suitable for the general e-mail traffic.

[1]  William F. Smyth,et al.  Rule-Based On-the-fly Web Spambot Detection Using Action Strings , 2010 .

[2]  Gordon V. Cormack,et al.  Spam Corpus Creation for TREC , 2005, CEAS.

[3]  Jan Wielemaker,et al.  An Architecture for Making Object-Oriented Systems Available from Prolog , 2002, WLPE.

[4]  Victor W. Marek,et al.  The Logic Programming Paradigm: A 25-Year Perspective , 2011 .

[5]  Seunghak Lee,et al.  Dynamically Weighted Hidden Markov Model for Spam Deobfuscation , 2007, IJCAI.

[6]  Victor W. Marek,et al.  Stable models and an alternative logic programming paradigm , 1998, The Logic Programming Paradigm.

[7]  Alek Kolcz,et al.  Improve Spam Filtering by Detecting Gray Mail , 2007, CEAS.

[8]  Georg Gottlob,et al.  Visual Web Information Extraction with Lixto , 2001, VLDB.

[9]  Vidyasagar Potdar,et al.  Evaluation of spam detection and prevention frameworks for email and image spam: a state of art , 2008, iiWAS.

[10]  Wen-tau Yih Improving Spam Filtering by Detecting Gray Mail , 2007 .

[11]  Ashwin Srinivasan,et al.  Parallel ILP for distributed-memory architectures , 2009, Machine Learning.

[12]  Alessandro Provetti,et al.  Rule-Based Spam E-mail Annotation , 2010, RR.

[13]  Gordon V. Cormack,et al.  Online supervised spam filter evaluation , 2007, TOIS.

[14]  Alexander Dekhtyar,et al.  Information Retrieval , 2018, Lecture Notes in Computer Science.

[15]  Randy Goebel,et al.  Computational intelligence - a logical approach , 1998 .

[16]  Andrea Omicini,et al.  Multi-paradigm Java-Prolog integration in tuProlog , 2005, Sci. Comput. Program..

[17]  Stephen Muggleton,et al.  Guest editorial: special issue on Inductive Logic Programming , 2007, Machine Learning.

[18]  Georg Gottlob,et al.  The Lixto data extraction project: back and forth between theory and practice , 2004, PODS.

[19]  Honglak Lee,et al.  Spam Deobfuscation using a Hidden Markov Model , 2005, CEAS.