Account classification in online social networks with LBCA and wavelets

We developed a wavelet-based approach for account classification that detects textual dissemination by bots on an Online Social Network (OSN). Its main objective is to match account patterns with humans, cyborgs or robots, improving the existing algorithms that automatically detect frauds. With a computational cost suitable for OSNs, the proposed approach analyses the distribution of key terms. The descriptors, a wavelet-based feature vector for each user's account, work in conjunction with a new weighting scheme, called Lexicon Based Coefficient Attenuation (LBCA) and serve as inputs to one of the classifiers tested: Random Forests and Multilayer Perceptrons. Experiments were performed using a set of posts crawled during the 2014 FIFA World Cup, obtaining accuracies within the range from 94 to 100%.

[1]  Vito Latora,et al.  Selfishness, Altruism and Message Spreading in Mobile Social Networks , 2009, IEEE INFOCOM Workshops 2009.

[2]  Marimuthu Palaniswami,et al.  Fourier domain scoring: a novel document ranking method , 2004, IEEE Transactions on Knowledge and Data Engineering.

[3]  Xiaoming Chang,et al.  An intelligent noise reduction method for chaotic signals based on genetic algorithms and lifting wavelet transforms , 2013, Inf. Sci..

[4]  Latifur Khan,et al.  Author attribution on streaming data , 2013, 2013 IEEE 14th International Conference on Information Reuse & Integration (IRI).

[5]  Sushil Jajodia,et al.  Detecting Automation of Twitter Accounts: Are You a Human, Bot, or Cyborg? , 2012, IEEE Transactions on Dependable and Secure Computing.

[6]  Gang Chen,et al.  Real-time recommendation for microblogs , 2014, Inf. Sci..

[7]  Nadire Cavus,et al.  Twitter Usage Habits of Undergraduate Students , 2012 .

[8]  Winston H. Hsu,et al.  Live Semantic Sport Highlight Detection Based on Analyzing Tweets of Twitter , 2012, 2012 IEEE International Conference on Multimedia and Expo.

[9]  Jian Lu,et al.  Enhanced Fractal-Wavelet Image Denoising , 2008, 2008 ISECS International Colloquium on Computing, Communication, Control, and Management.

[10]  Taghi M. Khoshgoftaar,et al.  Filter- and wrapper-based feature selection for predicting user interaction with Twitter bots , 2013, 2013 IEEE 14th International Conference on Information Reuse & Integration (IRI).

[11]  Douglas C. Creighton,et al.  Recognising User Identity in Twitter Social Networks via Text Mining , 2013, 2013 IEEE International Conference on Systems, Man, and Cybernetics.

[12]  S. Sivanesh,et al.  Frustrate Twitter from automation: How far a user can be trusted? , 2013, 2013 International Conference on Human Computer Interactions (ICHCI).

[13]  Nada Lavrac,et al.  Stream-based active learning for sentiment analysis in the financial domain , 2014, Inf. Sci..

[14]  Diana Purwitasari,et al.  FOURIER DOMAIN SCORING FOR RANKING METHOD IN SMALL DATA SET WITH PREPROCESSING USING ORACLE TEXT , 2007 .

[15]  Nektaria Potha,et al.  A Profile-Based Method for Authorship Verification , 2014, SETN.

[16]  Sharath Chandra Guntuku,et al.  Big Data Analytics framework for Peer-to-Peer Botnet detection using Random Forests , 2014, Inf. Sci..

[17]  Marimuthu Palaniswami,et al.  A new implementation technique for fast spectral based document retrieval systems , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[18]  Francisco Herrera,et al.  On the use of MapReduce for imbalanced big data using Random Forest , 2014, Inf. Sci..

[19]  Justin Zobel,et al.  Searching With Style: Authorship Attribution in Classic Literature , 2007, ACSC.

[20]  Shu-Ching Chen,et al.  Wavelet Analysis in Current Cancer Genome Research: A Survey , 2013, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[21]  David Mandell Freeman,et al.  Using naive bayes to detect spammy names in social networks , 2013, AISec.

[22]  Muhammad Abulaish,et al.  Community-based features for identifying spammers in Online Social Networks , 2013, 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2013).

[23]  Marimuthu Palaniswami,et al.  A Novel Web Text Mining Method Using the Discrete Cosine Transform , 2002, PKDD.

[24]  Lieguang Zeng,et al.  Energy-Efficient Optimal Opportunistic Forwarding for Delay-Tolerant Networks , 2010, IEEE Transactions on Vehicular Technology.

[25]  Konstantin Beznosov,et al.  The socialbot network: when bots socialize for fame and money , 2011, ACSAC '11.

[26]  Barbara Hammer,et al.  Neural Smithing – Supervised Learning in Feedforward Artificial Neural Networks , 2001, Pattern Analysis & Applications.

[27]  Taghi M. Khoshgoftaar,et al.  Predicting susceptibility to social bots on Twitter , 2013, 2013 IEEE 14th International Conference on Information Reuse & Integration (IRI).

[28]  Dewan Md. Farid,et al.  Mining Complex Data Streams: Discretization, Attribute Selection and Classification , 2013 .

[29]  Sara Tedmori,et al.  Image cryptographic algorithm based on the Haar wavelet transform , 2014, Inf. Sci..

[30]  Germán Aníbal Narváez Vásquez,et al.  Best Practice in the Use of Social Networks Marketing Strategy as in SMEs , 2014 .

[31]  Rui Fan,et al.  CUDAGRN: Parallel Speedup of Inferring Large Gene Regulatory Networks from Expression Data Using Random Forest , 2014, PRIB.

[32]  Giuseppe Sansonetti,et al.  Signal-based user recommendation on twitter , 2013, WWW.

[33]  Diana Purwitasari A STUDY ON RANKING METHOD IN RETRIEVING WEB PAGES BASED ON CONTENT AND LINK ANALYSIS: COMBINATION OF FOURIER DOMAIN SCORING AND PAGERANK SCORING , 2008 .

[34]  D. Purwitasari,et al.  A Study on Web Resources_ Navigation for e-Learning: Usage of Fourier Domain Scoring on Web Pages Ranking Method , 2007, Second International Conference on Innovative Computing, Informatio and Control (ICICIC 2007).

[35]  Banu Diri,et al.  Am I typing fresh tweets: Detecting up-to-dateness and worth of categorical information in microblogs , 2015, Expert Syst. Appl..

[36]  Krzysztof J. Cios,et al.  Structure-Based Document Model with Discrete Wavelet Transforms and Its Application to Document Classification , 2008, AusDM.

[37]  Antonino Staiano,et al.  A multilayer perceptron neural network-based approach for the identification of responsiveness to interferon therapy in multiple sclerosis patients , 2010, Inf. Sci..

[38]  Wei Hu,et al.  Twitter spammer detection using data stream clustering , 2014, Inf. Sci..

[39]  Andrew Lewis,et al.  Let a biogeography-based optimizer train your Multi-Layer Perceptron , 2014, Inf. Sci..

[40]  Mark A. Hall,et al.  Correlation-based Feature Selection for Machine Learning , 2003 .

[41]  Wallace A. Pinheiro,et al.  Using Wavelets to Classify Documents , 2008, 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology.

[42]  Bernd Freisleben,et al.  Document Relevance Evaluation via Term Distribution Analysis Using Fourier Series Expansion , 2009, ArXiv.

[43]  Seok Jong Yu,et al.  The dynamic competitive recommendation algorithm in social network services , 2012, Inf. Sci..

[44]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[45]  Marimuthu Palaniswami,et al.  A novel document retrieval method using the discrete wavelet transform , 2005, TOIS.

[46]  Andreas Dengel,et al.  Sentiment Analysis and Summarization of Twitter Data , 2013, 2013 IEEE 16th International Conference on Computational Science and Engineering.

[47]  Minaxi Gupta,et al.  Twitter games: how successful spammers pick targets , 2012, ACSAC '12.

[48]  Pak Chung Wong,et al.  TOPIC ISLANDS/sup TM/-a wavelet-based text visualization system , 1998, Proceedings Visualization '98 (Cat. No.98CB36276).

[49]  Pak Chung Wong,et al.  TOPIC ISLANDS/sup TM/-a wavelet-based text visualization system , 1998 .

[50]  S. Raghavan,et al.  A survey of wavelet techniques and multiresolution analysis for cancer diagnosis , 2011, 2011 International Conference on Computer, Communication and Electrical Technology (ICCCET).

[51]  Sylvio Barbon Junior,et al.  Improved Dynamic Time Warping Based on the Discrete Wavelet Transform , 2007, Ninth IEEE International Symposium on Multimedia Workshops (ISMW 2007).

[52]  Michele Zappavigna,et al.  Ambient affiliation: A linguistic perspective on Twitter , 2011, New Media Soc..

[53]  Huseyin Bicen Student Opinions Regarding Twitter Usage with Mobile Applications for Educational Purposes , 2014 .

[54]  Daniel Dajun Zeng,et al.  Twitter Sentiment Analysis: A Bootstrap Ensemble Framework , 2013, 2013 International Conference on Social Computing.

[55]  Pak Chung Wong,et al.  Dynamic visualization of transient data streams , 2003, IEEE Symposium on Information Visualization 2003 (IEEE Cat. No.03TH8714).

[56]  Max Nanis,et al.  Socialbots: voices from the fronts , 2012, INTR.

[57]  Christopher M. Danforth,et al.  Twitter reciprocal reply networks exhibit assortativity with respect to happiness , 2011, J. Comput. Sci..

[58]  Taghi M. Khoshgoftaar,et al.  Which Users Reply to and Interact with Twitter Social Bots? , 2013, 2013 IEEE 25th International Conference on Tools with Artificial Intelligence.

[59]  Tzu-Chao Lin,et al.  Wavelet-based copyright-protection scheme for digital images based on local features , 2009, Inf. Sci..

[60]  Mohamed M. Mostafa,et al.  More than words: Social networks' text mining for consumer brand sentiments , 2013, Expert Syst. Appl..

[61]  Simon Fong,et al.  Not every friend on a social network can be trusted: Classifying imposters using decision trees , 2012, The First International Conference on Future Generation Communication Technologies.

[62]  Lieguang Zeng,et al.  The Impact of Node Selfishness on Multicasting in Delay Tolerant Networks , 2011, IEEE Transactions on Vehicular Technology.

[63]  G. Aghila,et al.  Detection of fast flux network based social bot using analysis based techniques , 2012, 2012 International Conference on Data Science & Engineering (ICDSE).