Using Data Analytics to Filter Insincere Posts from Online Social Networks A Case Study: Quora Insincere Questions

The internet in general and Online Social Networks (OSNs) in particular continue to play a significant role in our life where information is massively uploaded and exchanged. With such high importance and attention, abuses of such media of communication for different purposes are common. Driven by goals such as marketing and financial gains, some users use OSNs to post their misleading or insincere content. In this context, we utilized a real-world dataset posted by Quora in Kaggle.com to evaluate different mechanisms and algorithms to filter insincere and spam contents. We evaluated different preprocessing and analysis models. Moreover, we analyzed the cognitive efforts users made in writing their posts and whether that can improve the prediction accuracy. We reported the best models in terms of insincerity prediction accuracy.

[1]  Clayton J. Hutto,et al.  Discriminative Models for Predicting Deception Strategies , 2015, WWW.

[2]  Claire Cardie,et al.  Finding Deceptive Opinion Spam by Any Stretch of the Imagination , 2011, ACL.

[3]  Arjun Mukherjee,et al.  Spotting fake reviewer groups in consumer reviews , 2012, WWW.

[4]  Miriam J. Metzger Making sense of credibility on the Web: Models for evaluating online information and recommendations for future research , 2007, J. Assoc. Inf. Sci. Technol..

[5]  Panagiotis G. Ipeirotis,et al.  Estimating the Helpfulness and Economic Impact of Product Reviews: Mining Text and Reviewer Characteristics , 2010, IEEE Transactions on Knowledge and Data Engineering.

[6]  Miriam J. Metzger,et al.  behaviors on the perceived credibility of web-based information The role of site features, user attributes, and information verification , 2007 .

[7]  Jun Liu,et al.  Discovering Design Principles for Persuasive Systems: A Grounded Theory and Text Mining Approach , 2016, 2016 49th Hawaii International Conference on System Sciences (HICSS).

[8]  Lina Zhou,et al.  Cues to Deception in Online Chinese Groups , 2008, Proceedings of the 41st Annual Hawaii International Conference on System Sciences (HICSS 2008).

[9]  Omar F. El-Gayar,et al.  Predicting Big Movers Based on Online Stock Forum Sentiment Analysis , 2015, AMCIS.

[10]  Jun Liu,et al.  Discovering Design Principles for Health Behavioral Change Support Systems , 2017, ACM Trans. Manag. Inf. Syst..

[11]  Yejin Choi,et al.  Syntactic Stylometry for Deception Detection , 2012, ACL.

[12]  Bing Liu,et al.  Spotting Fake Reviews via Collective Positive-Unlabeled Learning , 2014, 2014 IEEE International Conference on Data Mining.

[13]  Theodoros Lappas,et al.  Fake Reviews: The Malicious Perspective , 2012, NLDB.

[14]  J. Rowley,et al.  Trust and Credibility in Web-Based Health Information: A Review and Agenda for Future Research , 2017, Journal of medical Internet research.

[15]  A. A. Sheibani Opinion mining and opinion spam: A literature review focusing on product reviews , 2012, 6th International Symposium on Telecommunications (IST).

[16]  Darren Scott Appling,et al.  Cues to Deception in Social Media Communications , 2014, 2014 47th Hawaii International Conference on System Sciences.

[17]  J. Malbon Taking Fake Online Consumer Reviews Seriously , 2013 .

[18]  Jeffrey T. Hancock,et al.  Lies in the Eye of the Beholder: Asymmetric Beliefs about One’s Own and Others’ Deceptiveness in Mediated and Face-to-Face Communication , 2018, Commun. Res..

[19]  Raymond Y. K. Lau,et al.  Text mining and probabilistic language modeling for online review spam detecting , 2011 .

[20]  Miriam J. Metzger,et al.  Digital Media and Youth: Unparalleled Opportunity and Unprecedented Responsibility , 2008 .

[21]  Cherie Noteboom,et al.  What are the Gaps in Mobile Patient Portal? Mining Users Feedback Using Topic Modeling , 2018, HICSS.

[22]  Bing Liu,et al.  Review spam detection , 2007, WWW '07.

[23]  Ee-Peng Lim,et al.  Detecting product review spammers using rating behaviors , 2010, CIKM.

[24]  Svitlana Volkova,et al.  Misleading or Falsification: Inferring Deceptive Strategies and Types in Online News and Social Media , 2018, WWW.

[25]  Khim-Yong Goh,et al.  Social Media Brand Community and Consumer Behavior: Quantifying the Relative Impact of User- and Marketer-Generated Content , 2013, Inf. Syst. Res..

[26]  Yue Guo,et al.  Mining Meaning from Online Ratings and Reviews : Tourist Satisfaction Analysis Using Latent Dirichlet Allocation , 2016 .

[27]  Bing Liu,et al.  Opinion spam and analysis , 2008, WSDM '08.

[28]  Graeme Hirst,et al.  Detecting Deceptive Opinions with Profile Compatibility , 2013, IJCNLP.

[29]  Ee-Peng Lim,et al.  Finding unusual review patterns using unexpected rules , 2010, CIKM.