Feature extraction using LR-PCA hybridization on twitter data and classification accuracy using machine learning algorithms

Twitter, a social blogging site which became the tremendous topic in today’s environment, which made several organizations and public to develop their identity and overwhelming through this social website. But unfortunately, twitter facing great challenges due to spammers who break the reputation of the website from deliberate users to stop using it. Researchers have proposed many techniques to overcome the issues faced by the spammers. As far researchers find a new path so as the spammers develop new techniques to travel in that path. So far, many algorithms were proposed to detect the spammers and some extraction techniques have developed to increase the potential of detection rate. In this paper, the main focus is about feature extraction of our data with a hybrid approach of combining logistic regression with dimensional reduction technique using principal component analysis. Our dataset contains 17 million users’ tweets with 159 features included in it. Then we are going to extract particular features from it which would be helpful for the further process of increasing the classification accuracy. For the classification process, our work extended for the process of classification of data using some machine learning techniques. From the proposed work the detection rate could be increased by using particular features for the classification process.

[1]  V. Shanthi,et al.  An Approach for Discretization and Feature Selection Of Continuous-Valued Attributes in Medical Images for Classification Learning , 2009 .

[2]  Yi Yang,et al.  Beating the Artificial Chaos: Fighting OSN Spam Using Its Own Templates , 2016, IEEE/ACM Transactions on Networking.

[3]  Chao Yang,et al.  Empirical Evaluation and New Design for Fighting Evolving Twitter Spammers , 2011, IEEE Transactions on Information Forensics and Security.

[4]  Mengjie Zhang,et al.  Particle Swarm Optimization for Feature Selection in Classification: A Multi-Objective Approach , 2013, IEEE Transactions on Cybernetics.

[5]  Su He,et al.  Twitter Trends Manipulation: A First Look Inside the Security of Twitter Trending , 2017, IEEE Transactions on Information Forensics and Security.

[6]  Gunasekaran Manogaran,et al.  Spatial cumulative sum algorithm with big data analytics for climate change detection , 2017, Comput. Electr. Eng..

[7]  Gunasekaran Manogaran,et al.  Centralized Fog Computing Security Platform for IoT and Cloud in Healthcare System , 2018 .

[8]  Virgílio A. F. Almeida,et al.  Detecting Spammers on Twitter , 2010 .

[9]  Juha Röning,et al.  Improving the classification accuracy of streaming data using SAX similarity features , 2011, Pattern Recognit. Lett..

[10]  Xiao Zhi Gao,et al.  An adaptive decision based kriging interpolation algorithm for the removal of high density salt and pepper noise in images , 2017, Comput. Electr. Eng..

[11]  Naveen K. Chilamkurti,et al.  Secure Disintegration Protocol for Privacy Preserving Cloud Storage , 2018, Wirel. Pers. Commun..

[12]  Kyumin Lee,et al.  Uncovering social spammers: social honeypots + machine learning , 2010, SIGIR.

[13]  Yu Wang,et al.  Statistical Features-Based Real-Time Detection of Drifted Twitter Spam , 2017, IEEE Transactions on Information Forensics and Security.

[14]  Gunasekaran Manogaran,et al.  A new architecture of Internet of Things and big data ecosystem for secured smart healthcare monitoring and alerting system , 2017, Future Gener. Comput. Syst..

[15]  Jong Kim,et al.  WarningBird: A Near Real-Time Detection System for Suspicious URLs in Twitter Stream , 2013, IEEE Transactions on Dependable and Secure Computing.

[16]  Gunasekaran Manogaran,et al.  Intelligent face recognition and navigation system using neural learning for smart security in Internet of Things , 2017, Cluster Computing.

[17]  Gunasekaran Manogaran,et al.  A Gaussian process based big data processing framework in cluster computing environment , 2017, Cluster Computing.

[18]  Tomoaki Ohtsuki,et al.  A Pattern-Based Approach for Sarcasm Detection on Twitter , 2016, IEEE Access.

[19]  Gianluca Stringhini,et al.  Detecting spammers on social networks , 2010, ACSAC '10.

[20]  Nicolas Tsapatsoulis,et al.  Feature extraction for tweet classification: Do the humans perform better? , 2017, 2017 12th International Workshop on Semantic and Social Media Adaptation and Personalization (SMAP).

[21]  Gunasekaran Manogaran,et al.  HIoTPOT: Surveillance on IoT Devices against Recent Threats , 2018, Wirel. Pers. Commun..

[22]  Christopher M. Danforth,et al.  Sifting robotic from organic text: A natural language approach for detecting automation on Twitter , 2015, J. Comput. Sci..

[23]  Gunasekaran Manogaran,et al.  RETRACTED ARTICLE: Hybrid Recommendation System for Heart Disease Diagnosis based on Multiple Kernel Learning with Adaptive Neuro-Fuzzy Inference System , 2017, Multimedia Tools and Applications.

[24]  L JabaSheela An Approach for Discretization and Feature Selection Of Continuous-Valued Attributes in Medical Images for Classification Learning , 2009 .

[25]  Ching-Hsien Hsu,et al.  Machine Learning Based Big Data Processing Framework for Cancer Diagnosis Using Hidden Markov Model and GM Clustering , 2017, Wireless Personal Communications.

[26]  Gunasekaran Manogaran,et al.  RETRACTED ARTICLE: A big data classification approach using LDA with an enhanced SVM method for ECG signals in cloud computing , 2017, Multimedia Tools and Applications.

[27]  Gunasekaran Manogaran,et al.  Visual analysis of geospatial habitat suitability model based on inverse distance weighting with paired comparison analysis , 2017, Multimedia Tools and Applications.

[28]  Athanasios V. Vasilakos,et al.  Accelerated PSO Swarm Search Feature Selection for Data Stream Mining Big Data , 2016, IEEE Transactions on Services Computing.

[29]  Wenjie Li,et al.  Sequential Summarization: A Full View of Twitter Trending Topics , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[30]  Masoumeh Zareapoor,et al.  Feature Extraction or Feature Selection for Text Classification: A Case Study on Phishing Email Detection , 2015 .

[31]  Gunasekaran Manogaran,et al.  Wearable sensor devices for early detection of Alzheimer disease using dynamic time warping algorithm , 2018, Cluster Computing.

[32]  Yiannis Kompatsiaris,et al.  Predicting Elections for Multiple Countries Using Twitter and Polls , 2015, IEEE Intelligent Systems.

[33]  Xiao Chen,et al.  6 million spam tweets: A large ground truth for timely Twitter spam detection , 2015, 2015 IEEE International Conference on Communications (ICC).

[34]  Chi-Jie Lu,et al.  An Improved Independent Component Analysis Algorithm Based on Artificial Immune System , 2013 .

[35]  Manabu Kotani,et al.  Feature Extraction Using Independent Components of Each Category , 2005, Neural Processing Letters.