Detecting Collusive Spamming Activities in Community Question Answering

Community Question Answering (CQA) portals provide rich sources of information on a variety of topics. However, the authenticity and quality of questions and answers (Q&As) has proven hard to control. In a troubling direction, the widespread growth of crowdsourcing websites has created a large-scale, potentially difficult-to-detect workforce to manipulate malicious contents in CQA. The crowd workers who join the same crowdsourcing task about promotion campaigns in CQA collusively manipulate deceptive Q&As for promoting a target (product or service). The collusive spamming group can fully control the sentiment of the target. How to utilize the structure and the attributes for detecting manipulated Q&As? How to detect the collusive group and leverage the group information for the detection task? To shed light on these research questions, we propose a unified framework to tackle the challenge of detecting collusive spamming activities of CQA. First, we interpret the questions and answers in CQA as two independent networks. Second, we detect collusive question groups and answer groups from these two networks respectively by measuring the similarity of the contents posted within a short duration. Third, using attributes (individual-level and group-level) and correlations (user-based and content-based), we proposed a combined factor graph model to detect deceptive Q&As simultaneously by combining two independent factor graphs. With a large-scale practical data set, we find that the proposed framework can detect deceptive contents at early stage, and outperforms a number of competitive baselines.

[1]  Nitesh V. Chawla,et al.  CoupledLP: Link Prediction in Coupled Networks , 2015, KDD.

[2]  Yejin Choi,et al.  Distributional Footprints of Deceptive Product Reviews , 2012, ICWSM.

[3]  Bing Liu,et al.  Opinion spam and analysis , 2008, WSDM '08.

[4]  Wei Gao,et al.  Detecting Rumors from Microblogs with Recurrent Neural Networks , 2016, IJCAI.

[5]  Minhwan Yu,et al.  Deep Semantic Frame-Based Deceptive Opinion Spam Analysis , 2015, CIKM.

[6]  Grzegorz Chrupala,et al.  Question Quality in Community Question Answering Forums: a survey , 2015, SKDD.

[7]  Gilad Mishne,et al.  Finding high-quality content in social media , 2008, WSDM '08.

[8]  Chang Xu,et al.  Detecting collusive spammers in online review communities , 2013, PIKM '13.

[9]  Michael I. Jordan,et al.  Loopy Belief Propagation for Approximate Inference: An Empirical Study , 1999, UAI.

[10]  Kyumin Lee,et al.  Seven Months with the Devils: A Long-Term Study of Content Polluters on Twitter , 2011, ICWSM.

[11]  Abhinav Kumar,et al.  Spotting opinion spammers using behavioral footprints , 2013, KDD.

[12]  Chong Long,et al.  Uncovering collusive spammers in Chinese review websites , 2013, CIKM.

[13]  Lei Zhang,et al.  Simultaneously detecting fake reviews and review spammers using factor graph model , 2013, WebSci.

[14]  Kyumin Lee,et al.  Content-driven detection of campaigns in social media , 2011, CIKM '11.

[15]  Dragomir R. Radev,et al.  Rumor has it: Identifying Misinformation in Microblogs , 2011, EMNLP.

[16]  Ludo Waltman,et al.  A smart local moving algorithm for large-scale modularity-based community detection , 2013, The European Physical Journal B.

[17]  Fangzhao Wu,et al.  Social Spammer and Spam Message Co-Detection in Microblogging with Social Context Regularization , 2015, CIKM.

[18]  Srinivasan Venkatesh,et al.  The best answers? Think twice: Online detection of commercial campaigns in the CQA forums , 2013, 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2013).

[19]  Claire Cardie,et al.  Finding Deceptive Opinion Spam by Any Stretch of the Imagination , 2011, ACL.

[20]  Santo Fortunato,et al.  Community detection in graphs , 2009, ArXiv.

[21]  Martin Ester,et al.  Detecting Singleton Review Spammers Using Semantic Similarity , 2015, WWW.

[22]  Jeffrey Pomerantz,et al.  Evaluating and predicting answer quality in community QA , 2010, SIGIR.

[23]  BaltadzhievaAntoaneta,et al.  Question Quality in Community Question Answering Forums , 2015 .

[24]  Yiqun Liu,et al.  Detecting Promotion Campaigns in Community Question Answering , 2015, IJCAI.

[25]  Jie Tang,et al.  Learning to Infer Social Ties in Large Networks , 2011, ECML/PKDD.

[26]  Zhoujun Li,et al.  Question Retrieval with High Quality Answers in Community Question Answering , 2014, CIKM.

[27]  Arjun Mukherjee,et al.  Spotting fake reviewer groups in consumer reviews , 2012, WWW.

[28]  G. Clark,et al.  Reference , 2008 .

[29]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[30]  Anna Cinzia Squicciarini,et al.  Uncovering Crowdsourced Manipulation of Online Reviews , 2015, SIGIR.

[31]  Kyumin Lee,et al.  Campaign extraction from social media , 2013, ACM Trans. Intell. Syst. Technol..

[32]  Jin-Wook Chung,et al.  Organic or Organized?: Exploring URL Sharing Behavior , 2015, CIKM.

[33]  Ee-Peng Lim,et al.  Detecting product review spammers using rating behaviors , 2010, CIKM.

[34]  Michael R. Lyu,et al.  Analyzing and predicting question quality in community question answering services , 2012, WWW.