The significant effect of feature selection methods in spam risk assessment using dendritic cell algorithm

The vast amount of online documentation and the thriving of Internet especially mobile technology have caused a crucial demand to handle and organize unstructured data appropriately. An information retrieval or even knowledge discovery can be enhanced when a proper and structured data are available. This paper studies empirically the effect of pre-selected term weighting schemes, namely as Term Frequency (TF), Information Gain Ratio (IG Ratio) and Chi-Square (CHI2) in the assessment of a threat's impact loss. This feature selection method then further fed in conjunction with the Dendritic Cell Algorithm (DCA) as the classifier to measure the risk concentration of a spam message. The final outcome of this research is very much expected to be able in assisting people to make a decision once they knew the possible impact caused by a particular spam. The findings showed that TF is the best feature selection methods and well suited to be demonstrated together with the DCA, resulted with high accuracy risk classification rate.

[1]  El-Sayed M. El-Alfy,et al.  A novel bio-inspired predictive model for spam filtering based on dendritic cell algorithm , 2014, 2014 IEEE Symposium on Computational Intelligence in Cyber Security (CICS).

[2]  Xiaohua Hu,et al.  HIGH DIMENSIONAL UNSUPERVISED CLUSTERING BASED FEATURE SELECTION ALGORITHM , .

[3]  P. Matzinger Tolerance, danger, and the extended family. , 1994, Annual review of immunology.

[4]  Sophia Ananiadou,et al.  Introduction to Text Mining in Biology , 2006 .

[5]  Rada Mihalcea,et al.  Random Walk Term Weighting for Improved Text Classification , 2007, Int. J. Semantic Comput..

[6]  Nazlia Omar,et al.  Experiments on the Use of Feature Selection and Machine Learning Methods in Automatic Malay Text Categorization , 2013 .

[7]  Wei Wang,et al.  Application of Bayesian Method to Spam SMS Filtering , 2009, 2009 International Conference on Information Engineering and Computer Science.

[8]  Julie Greensmith,et al.  Articulation and Clarification of the Dendritic Cell Algorithm , 2006, ICARIS.

[9]  El-Sayed M. El-Alfy,et al.  Dendritic Cell Algorithm for Mobile Phone Spam Filtering , 2015, ANT/SEIT.

[10]  Adamu I. Abubakar,et al.  A Review on Mobile SMS Spam Filtering Techniques , 2017, IEEE Access.

[11]  Mohd Zakree Ahmad Nazri,et al.  Mining Opinion in Online Messages , 2013 .

[12]  Wildrich Fourie,et al.  Choosing the best classier for the job: Mobile Filtering for the South African Context , 2012 .

[13]  Shourya Roy,et al.  How Much Noise Is Too Much: A Study in Automatic Text Classification , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[14]  Fabrizio Sebastiani,et al.  Supervised term weighting for automated text categorization , 2003, SAC '03.

[15]  M. Aono,et al.  Ontology based Approach for Classifying Biomedical Text Abstracts , 2011 .

[16]  Wildrich Fourie,et al.  Choosing the best classifier for the job: Mobile Filtering for the South African Context , 2013 .

[17]  Tiago A. Almeida,et al.  Towards SMS Spam Filtering: Results under a New Dataset , 2013 .

[18]  Gu Ji-yan,et al.  The Dendritic Cell Algorithm , 2011 .

[19]  G. Pereira Artificial Immune System Algorithm based on Danger Theory , 2011 .

[20]  Mohd Zalisham Jali,et al.  A Review of Feature Extraction Optimization in SMS Spam Messages Classification , 2016, SCDS.

[21]  Darma Putra,et al.  Personality Types Classification for Indonesian Text in Partners Searching Website Using Naïve Bayes Methods , 2013 .

[22]  Nazlia Omar,et al.  Bayesian learning for automatic Arabic text categorization , 2013 .

[23]  Rajaram Ramasamy,et al.  Resource Optimization in Automatic web page classification using integrated feature selection and machine learning , 2009, Int. Arab. J. e Technol..

[25]  Lei Ding,et al.  Survey of DCA for Abnormal Detection , 2013, J. Softw..

[26]  Ruchuan Wang,et al.  Research on Network Malicious Code Immune Based on Imbalanced Support Vector Machines , 2015 .

[27]  Semih Ergin,et al.  The Impact of Feature Extraction and Selection on SMS Spam Filtering , 2013 .

[28]  D. Dasgupta,et al.  Advances in artificial immune systems , 2006, IEEE Computational Intelligence Magazine.

[29]  Earl Harris Information Gain Versus Gain Ratio: A Study of Split Method Biases , 2002, ISAIM.

[30]  Flora S. Tsai,et al.  Sentence-Level Novelty Detection in English and Malay , 2009, PAKDD.

[31]  S. Ergin,et al.  A novel framework for SMS spam filtering , 2012, 2012 International Symposium on Innovations in Intelligent Systems and Applications.

[32]  Ah-Hwee Tan,et al.  A Comparative Study on Chinese Text Categorization Methods , 2000, PRICAI Workshop on Text and Web Mining.

[33]  Nazlia Omar,et al.  An automated arabic text categorization based on the frequency ratio accumulation , 2014, Int. Arab J. Inf. Technol..

[34]  Youngjoong Ko,et al.  Automatic Text Categorization by Unsupervised Learning , 2000, COLING.

[35]  P. Matzinger The Danger Model: A Renewed Sense of Self , 2002, Science.

[36]  Anderson Paulo de Paiva,et al.  Factorial design analysis applied to the performance of SMS anti-spam filtering systems , 2016, Expert Syst. Appl..

[37]  Akebo Yamakami,et al.  Contributions to the study of SMS spam filtering: new collection and results , 2011, DocEng '11.

[38]  Shubhamoy Dey,et al.  Performance Investigation of Feature Selection Methods and Sentiment Lexicons for Sentiment Analysis , 2012 .

[39]  Julie Greensmith,et al.  Sensing Danger: Innate Immunology for Intrusion Detection , 2007, Inf. Secur. Tech. Rep..

[40]  P. Matzinger The evolution of the danger theory , 2012, Expert review of clinical immunology.

[41]  Louise A. Francis Taming Text: An Introduction to Text Mining , 2006 .

[42]  B. Ribeiro,et al.  On using an ensemble approach of AIS and SVM for text classification , 2010 .

[43]  Khairullah Khan,et al.  A Review of Machine Learning Algorithms for Text-Documents Classification , 2010 .

[44]  Vijender Bhootna,et al.  SMS Spam Filtering Application Using Android , 2014 .

[45]  Robert J. Hilderman,et al.  Categorical Proportional Difference: A Feature Selection Method for Text Categorization , 2008, AusDM.

[46]  E. Sezer,et al.  INVESTIGATION OF TERM WEIGHTING SCHEMES IN CLASSIFICATION OF IMBALANCED TEXTS , 2014 .

[47]  Julie Greensmith,et al.  Detecting Danger: The Dendritic Cell Algorithm , 2010, ArXiv.

[48]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[49]  Tao Chen,et al.  Creating a live, public short message service corpus: the NUS SMS corpus , 2011, Lang. Resour. Evaluation.

[50]  Henri Pierreval,et al.  Fault detection, diagnosis and recovery using Artificial Immune Systems: A review , 2015, Eng. Appl. Artif. Intell..

[51]  Andrés Romero,et al.  AN ARTIFICIAL IMMUNE SYSTEM BASED ON INFORMATION THEORY FOR KEYWORD EXTRACTION FROM TEXT DOCUMENTS , 2007 .

[52]  Julie Greensmith,et al.  The dendritic cell algorithm , 2007 .

[53]  Peter J. Bentley,et al.  Danger Is Ubiquitous: Detecting Malicious Activities in Sensor Networks Using the Dendritic Cell Algorithm , 2006, ICARIS.

[54]  Ronen Feldman,et al.  The Text Mining Handbook: Introduction to Text Mining , 2006 .