AAFA: Associative Affinity Factor Analysis for Bot Detection and Stance Classification in Twitter

The rise in popularity of social interacting websites such as Facebook, Twitter, and Snapchat has been challenged by the upsurge of unwelcomed and troubling bodies on these systems. This includes spam senders, malware systems, and other content contaminators. It is noted that highly automated accounts with 450 tweets per day produced almost 18% of entire Twitter circulation in the 2016 U.S. Presidential election. It is also observed that those disruptive systems called bots are inclined more towards circulating negative news than positive information. This paper introduces a novel framework named Associative Affinity Factor Analysis (AAFA) designed for stance detection and bot identification. Using AAFA, the proposed framework identifies real people from bots and detects the stance in bipolar affinities. The 2016 U.S. Presidential election campaign was used as a test use case because of its significant and unique counter-factual properties. The results show that our proposed AAFA framework achieves high accuracy when compared to several existing state-of-theart methods.

[1]  Rangasami L. Kashyap,et al.  Temporal And Spatial Semantic Models For Multimedia Presentations , 1997 .

[2]  Jérôme Pagès,et al.  Multiple factor analysis and clustering of a mixture of quantitative, categorical and frequency data , 2008, Comput. Stat. Data Anal..

[3]  Johan Bollen,et al.  Modeling Public Mood and Emotion: Twitter Sentiment and Socio-Economic Phenomena , 2009, ICWSM.

[4]  Mei-Ling Shyu,et al.  Supporting Semantic Concept Retrieval with Negative Correlations in a Multimedia Big Data Mining System , 2016, Int. J. Semantic Comput..

[5]  Rangasami L. Kashyap,et al.  Augmented Transition Network as a Semantic Model for Video Data , 2001 .

[6]  Vaibhavi N Patodkar,et al.  Twitter as a Corpus for Sentiment Analysis and Opinion Mining , 2016 .

[7]  Choochart Haruechaiyasak,et al.  Category cluster discovery from distributed WWW directories , 2003, Inf. Sci..

[8]  Samira Pouyanfar,et al.  Semantic Event Detection Using Ensemble Deep Learning , 2016, 2016 IEEE International Symposium on Multimedia (ISM).

[9]  D.M. Mount,et al.  An Efficient k-Means Clustering Algorithm: Analysis and Implementation , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[10]  Min Chen,et al.  Video Semantic Event/Concept Detection Using a Subspace-Based Multimedia Data Mining Framework , 2008, IEEE Transactions on Multimedia.

[11]  Shu-Ching Chen,et al.  Enhancing Multimedia Imbalanced Concept Detection Using VIMP in Random Forests , 2016, 2016 IEEE 17th International Conference on Information Reuse and Integration (IRI).

[12]  Xiuqi Li,et al.  Image Retrieval By Color , Texture , And Spatial Information , 2002 .

[13]  Mei-Ling Shyu,et al.  Leveraging Concept Association Network for Multimedia Rare Concept Mining and Retrieval , 2012, 2012 IEEE International Conference on Multimedia and Expo.

[14]  Shu-Ching Chen,et al.  Effective supervised discretization for classification based on correlation maximization , 2011, 2011 IEEE International Conference on Information Reuse & Integration.

[15]  Min Chen,et al.  Spatio-Temporal Analysis for Human Action Detection and Recognition in Uncontrolled Environments , 2015, Int. J. Multim. Data Eng. Manag..

[16]  Owen Rambow,et al.  Sentiment Analysis of Twitter Data , 2011 .

[17]  Shu-Ching Chen,et al.  Video Semantic Concept Discovery using Multimodal-Based Association Classification , 2007, 2007 IEEE International Conference on Multimedia and Expo.

[18]  Rangasami L. Kashyap,et al.  Augmented transition networks as video browsing models for multimedia databases and multimedia information systems , 1999, Proceedings 11th International Conference on Tools with Artificial Intelligence.

[19]  Henri Luchian,et al.  A unifying criterion for unsupervised clustering and feature selection , 2011, Pattern Recognit..

[20]  Shu-Ching Chen,et al.  A Classifier Ensemble Framework for Multimedia Big Data Classification , 2016, 2016 IEEE 17th International Conference on Information Reuse and Integration (IRI).

[21]  H. Abdi,et al.  Multiple factor analysis: principal component analysis for multitable and multiblock data sets , 2013 .

[22]  Min Chen,et al.  Deep Learning for Imbalanced Multimedia Data Classification , 2015, 2015 IEEE International Symposium on Multimedia (ISM).

[23]  Mei-Ling Shyu,et al.  Negative Correlation Discovery for Big Multimedia Data Semantic Concept Mining and Retrieval , 2016, 2016 IEEE Tenth International Conference on Semantic Computing (ICSC).

[24]  Shu-Ching Chen,et al.  Feature Selection Using Correlation and Reliability Based Scoring Metric for Video Semantic Detection , 2010, 2010 IEEE Fourth International Conference on Semantic Computing.

[25]  T. Moon The expectation-maximization algorithm , 1996, IEEE Signal Process. Mag..

[26]  Chao Chen,et al.  Web media semantic concept retrieval via tag removal and model fusion , 2013, ACM Trans. Intell. Syst. Technol..

[27]  Xin Huang,et al.  User Concept Pattern Discovery Using Relevance Feedback And Multiple Instance Learning For Content-Based Image Retrieval , 2002, MDM/KDD.

[28]  Min Chen,et al.  A latent semantic indexing based method for solving multiple instance learning problem in region-based image retrieval , 2005, Seventh IEEE International Symposium on Multimedia (ISM'05).

[29]  Xuegong Zhang,et al.  Prediction of kinase‐specific phosphorylation sites with sequence features by a log‐odds ratio approach , 2007, Proteins.

[30]  Rosanna Verde,et al.  Factorial Methods with Cohesion Constraints on Symbolic Objects , 2000 .

[31]  B. L. Roux,et al.  Multiple Correspondence Analysis , 2009 .

[32]  Mei-Ling Shyu,et al.  Handling nominal features in anomaly intrusion detection problems , 2005, 15th International Workshop on Research Issues in Data Engineering: Stream Data Mining and Applications (RIDE-SDMA'05).

[33]  Choochart Haruechaiyasak,et al.  Mining user access behavior on the WWW , 2001, 2001 IEEE International Conference on Systems, Man and Cybernetics. e-Systems and e-Man for Cybernetics in Cyberspace (Cat.No.01CH37236).

[34]  H. Abdi,et al.  Multiple Correspondence Analysis , 2006 .

[35]  Min Chen,et al.  Efficient Imbalanced Multimedia Concept Retrieval by Deep Learning on Spark Clusters , 2017, Int. J. Multim. Data Eng. Manag..

[36]  Chengcui Zhang,et al.  Innovative Shot Boundary Detection for Video Indexing , 2005 .

[37]  Johan A. K. Suykens,et al.  Least Squares Support Vector Machine Classifiers , 1999, Neural Processing Letters.

[38]  S. Horvath,et al.  Unsupervised Learning With Random Forest Predictors , 2006 .

[39]  Ian T. Jolliffe,et al.  Principal Component Analysis , 2002, International Encyclopedia of Statistical Science.

[40]  Mei-Ling Shyu,et al.  Weighted Association Rule Mining for Video Semantic Detection , 2010, Int. J. Multim. Data Eng. Manag..

[41]  Rangasami L. Kashyap,et al.  Identifying Overlapped Objects for Video Indexing and Modeling in Multimedia Database Systems , 2001, Int. J. Artif. Intell. Tools.

[42]  James M. Keller,et al.  A possibilistic fuzzy c-means clustering algorithm , 2005, IEEE Transactions on Fuzzy Systems.

[43]  Fionn Murtagh,et al.  Ward’s Hierarchical Agglomerative Clustering Method: Which Algorithms Implement Ward’s Criterion? , 2011, Journal of Classification.

[44]  Filippo Menczer,et al.  Online Human-Bot Interactions: Detection, Estimation, and Characterization , 2017, ICWSM.

[45]  Chengcui Zhang,et al.  An intelligent framework for spatio-temporal vehicle tracking , 2001, ITSC 2001. 2001 IEEE Intelligent Transportation Systems. Proceedings (Cat. No.01TH8585).

[46]  Johanna D. Moore,et al.  Twitter Sentiment Analysis: The Good the Bad and the OMG! , 2011, ICWSM.

[47]  Kyumin Lee,et al.  Seven Months with the Devils: A Long-Term Study of Content Polluters on Twitter , 2011, ICWSM.

[48]  Filippo Menczer,et al.  BotOrNot: A System to Evaluate Social Bots , 2016, WWW.

[49]  Rangasami L. Kashyap,et al.  Generalized Affinity-Based Association Rule Mining for Multimedia Database Queries , 2001, Knowledge and Information Systems.

[50]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[51]  Min Chen,et al.  Utilizing concept correlations for effective imbalanced data classification , 2014, Proceedings of the 2014 IEEE 15th International Conference on Information Reuse and Integration (IEEE IRI 2014).