Spam Detection in Social Networks Using Correlation Based Feature Subset Selection

Bayesian classifier works efficiently on some fields, and badly on some. The performance of Bayesian Classifier suffers in fields that involve correlated features. Feature selection is beneficial in reducing dimensionality, removing irrelevant data, incrementing learning accuracy, and improving result comprehensibility. But, the recent increase of dimensionality of data place a hard challenge to many existing feature selection methods with respect to efficiency and effectiveness. In this paper, Bayesian Classifier with Correlation Based Feature Selection is introduced which can key out relevant features as well as redundancy among relevant features without pair wise correlation analysis. The efficiency and effectiveness of our method is presented through broad.

[1]  Luca Becchetti,et al.  Link analysis for Web spam detection , 2008, TWEB.

[2]  P. Deepa Shenoy,et al.  ReP-ETD: A Repetitive Preprocessing technique for Embedded Text Detection from images in spam emails , 2014, 2014 IEEE International Advance Computing Conference (IACC).

[3]  Vidyasagar Potdar,et al.  Evaluation of spam detection and prevention frameworks for email and image spam: a state of art , 2008, iiWAS.

[4]  Mark Allman,et al.  Can Network Characteristics Detect Spam Effectively in a Stand-Alone Enterprise? , 2011, PAM.

[5]  Yin Zhang,et al.  Measuring and fingerprinting click-spam in ad networks , 2012, SIGCOMM '12.

[6]  Wouter Weerkamp,et al.  A Framework for Unsupervised Spam Detection in Social Networking Sites , 2012, ECIR.

[7]  Igor Santos,et al.  JURD: Joiner of Un-Readable Documents to reverse tokenization attacks to content-based spam filters , 2013, 2013 IEEE 10th Consumer Communications and Networking Conference (CCNC).

[8]  Alexander J. Smola,et al.  Collaborative Email-Spam Filtering with the Hashing-Trick , 2009 .

[9]  H. Hussin,et al.  Should we be concerned with spam emails? A look at its impacts and implications , 2013, 2013 5th International Conference on Information and Communication Technology for the Muslim World (ICT4M).

[10]  Calton Pu,et al.  A study on evolution of email spam over fifteen years , 2013, 9th IEEE International Conference on Collaborative Computing: Networking, Applications and Worksharing.

[11]  Karthika Renuka,et al.  Latent Semantic Indexing Based SVM Model for Email Spam Classification , 2014 .

[12]  Robert E. Mercer,et al.  Classifying Spam Emails Using Text and Readability Features , 2013, 2013 IEEE 13th International Conference on Data Mining.

[13]  Jianfeng Ma,et al.  Content Based Spam Text Classification: An Empirical Comparison between English and Chinese , 2013, 2013 5th International Conference on Intelligent Networking and Collaborative Systems.

[14]  Huan Liu,et al.  Feature Selection for High-Dimensional Data: A Fast Correlation-Based Filter Solution , 2003, ICML.