A Constant Time Complexity Spam Detection Algorithm for Boosting Throughput on Rule-Based Filtering Systems

Along with the barbarous growth of spams, anti-spam technologies including rule-based approaches and machine-learning thrive rapidly as well. In antispam industry, the rule-based systems (RBS) becomes the most prominent methods for fighting spam due to its capability to enrich and update rules remotely. However, the antispam filtering throughput is always a great challenge of RBS. Especially, the explosively spreading of obfuscated words leads to frequent rule update and extensive rule vocabulary expansion. These incremental obfuscated words make the filtering speed slow down and the throughput decrease. This paper addresses the challenging throughput issue and proposes a constant time complexity rule-based spam detection algorithm. The algorithm has a constant processing speed, which is independent of rule and its vocabulary size. A new special data structure, namely, Hash Forest, and a rule encoding method are developed to make constant time complexity possible. Instead of traversing each spam term in rules, the proposed algorithm manages to detect spam terms by checking a very small portion of all terms. The experiment results show effectiveness of proposed algorithm.

[1]  Lluís Màrquez i Villodre,et al.  Boosting Trees for Anti-Spam Email Filtering , 2001, ArXiv.

[2]  Eystein Mathisen,et al.  Security challenges and solutions in cloud computing , 2011, 5th IEEE International Conference on Digital Ecosystems and Technologies (IEEE DEST 2011).

[3]  Sreekanth Madisetty,et al.  A Neural Network-Based Ensemble Approach for Spam Detection in Twitter , 2018, IEEE Transactions on Computational Social Systems.

[4]  Mohammad Umar Siddiqi,et al.  Computational complexity and implementation aspects of the incremental hash function , 2003, IEEE Trans. Consumer Electron..

[5]  Harris Drucker,et al.  Support vector machines for spam categorization , 1999, IEEE Trans. Neural Networks.

[6]  M. Z. Gashti Detection of Spam Email by Combining Harmony Search Algorithm and Decision Tree , 2017 .

[7]  Florentino Fernández Riverola,et al.  RuleSIM: a toolkit for simulating the operation and improving throughput of rule‐based spam filters , 2016, Softw. Pract. Exp..

[8]  Florentino Fernández Riverola,et al.  Using new scheduling heuristics based on resource consumption information for increasing throughput on rule‐based spam filtering systems , 2016, Softw. Pract. Exp..

[9]  V. V. Arutyunov Spam: Its past, present, and future , 2013, Scientific and Technical Information Processing.

[10]  Eduardo Díaz,et al.  Grindstone4Spam: An optimization toolkit for boosting e-mail classification , 2012, J. Syst. Softw..

[11]  Roberto Battiti,et al.  "May I borrow your filter?" Exchanging filters to combat spam in a community , 2006, 20th International Conference on Advanced Information Networking and Applications - Volume 1 (AINA'06).

[12]  Sanjay Misra,et al.  A review of soft techniques for SMS spam classification: Methods, approaches and applications , 2019, Eng. Appl. Artif. Intell..

[13]  Chenping Hou,et al.  One-Pass Learning with Incremental and Decremental Features , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Kenji Nakamura,et al.  Detection Method of Blog Spam Based on Categorization and Time Series Information , 2012, 2012 26th International Conference on Advanced Information Networking and Applications Workshops.

[15]  O. M. E. Ebadati,et al.  Classification Spam Email with Elimination of Unsuitable Features with Hybrid of GA-Naive Bayes , 2019, J. Inf. Knowl. Manag..

[16]  Mamun Bin Ibne Reaz,et al.  A novel weighted support vector machines multiclass classifier based on differential evolution for intrusion detection systems , 2017, Inf. Sci..

[17]  Moch. Arif Bijaksana,et al.  Enhancing spam detection on mobile phone Short Message Service (SMS) performance using FP-growth and Naive Bayes Classifier , 2016, 2016 IEEE Asia Pacific Conference on Wireless and Mobile (APWiMob).

[18]  Adamu I. Abubakar,et al.  A Review on Mobile SMS Spam Filtering Techniques , 2017, IEEE Access.

[19]  Florentino Fernández Riverola,et al.  Effective scheduling strategies for boosting performance on rule-based spam filtering frameworks , 2013, J. Syst. Softw..

[20]  Florentino Fernández Riverola,et al.  Wirebrush4SPAM: a novel framework for improving efficiency on spam filtering services , 2013, Softw. Pract. Exp..

[21]  R.F. Erbacher,et al.  An Evaluation of Naïve Bayesian Anti-Spam Filtering Techniques , 2007, 2007 IEEE SMC Information Assurance and Security Workshop.

[22]  Pradeep Kumar Roy,et al.  Deep learning to filter SMS Spam , 2020, Future Gener. Comput. Syst..